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Chapter 1 


Basic Statistical Concepts 


1.2 Populations and Samples 


iL. 


(a) The population consists of the customers who bought a car during the previous 
year. 


(b) The population is not hypothetical. 
(a) There are three populations, one for each variety of corn. Each variety of corn 
that has been and will be planted on all kinds of plots make up the population. 


(b) The characteristic of interest is the yield of each variety of corn at the time of 
harvest. 


(c) There are three samples, one for each variety of corn. Each variety of corn that 
was planted on the 10 randomly selected plots make up the sample. 

(a) There are two populations, one for each shift. The cars that have been and 
will be produced on each shift make up the population. 

(b) The populations are hypothetical. 


(c) The characteristic of interest is the number of nonconformances per car. 


(a) The population consists of the all domestic flights, past or future. 

(b) The sample consists of the 175 domestic flights. 

(c) The characteristic of interest is the air quality, quantified by the degree of 
staleness. 

(a) There are two populations, one for each teaching method. 


(b) The population consists of all students who took or will take a statistics course 
for engineering using one of each teaching methods. 


(c) The populations are hypothetical. 


(d) The samples consist of the students whose scores will be recorded at the end 
of the semester. 
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1.3 Some Sampling Concepts 
1. The second choice provides a closer approximation to simple random sample. 


2. (a) It is not a simple random sample. 


(b) In (a), each member of the population does not have equal chance to be 
selected, thus it is not a simple random sample. Instead, the method described 
in (a) is a stratified sampling. 


a) The population includes all the drivers in the university town. 
b) The student’s classmates do not constitute a simple random sample. 


( 

( 

(c) It is a convenient sample. 

(d) Young college students are not experienced drivers, thus they tend to use seat 
belts less. Consequently, the sample in this problem will underestimate the 


proportion. 


4. We identify each person with a number from 1 to 70. Then we write each number 
from 1 to 70 on separate, identical slips of paper, put all 70 slips of paper in a box, 
and mix them thoroughly. Finally, we select 15 slips from the box, one at a time, 
without replacement. The 15 selected numbers specify the desired sample of size 
n = 15 from the 70 iPhones. The R command is 


y = sample(seq(1,70), size=15) 


A sample set is 52 8 14 48 62 6 70 35 18 20 3 41 50 27 40. 


5. We identify each pipe with a number from 1 to 90. Then we write each number 
from 1 to 90 on separate, identical slips of paper, put all 90 slips of paper in a box, 
and mix them thoroughly. Finally, we select 5 slips from the box, one at a time, 
without replacement. The 5 selected numbers specify the desired sample of size 
n = 5 from the 90 drain pipes. The R command is 


y = sample(seq(1,90), size=5), 


A sample set is 7 38 65 71 57. 


6. (a) We identify each client with a number from 1 to 1000. Then we write each 
number from 1 to 1000 on separate, identical slips of paper, put all 1000 slips 
of paper in a box, and mix them thoroughly. Finally, we select 100 slips from 
the box, one at a time, without replacement. The 100 selected numbers specify 
the desired sample of size n = 100 from the 1000 clients. 
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(b) Using stratified sampling: Get a simple random sample of size 80 from the 
sub-population of Caucasian-Americans, a simple random sample of size 15 
from the sub-population of African-Americans, and a simple random sample 
of size 5 from the sub-population of Hispanic-Americans. Then combine the 
three subsamples together. 


(c) The R command for part (a) is 
y = sample(seq(1,1000), size=100) 
and the R command for part (b) is 


yl = sample(seq(1,800), size=80) 
y2 = sample(seq(801,950), size=15) 
y3 = sample(seq(951,1000), size=5) 
y = c(yl, y2, y3) 


7. One method is to take a simple random sample of size n from the population of 
N customers (of all dealerships of that car manufacturer) who bought a car the 
previous year. 


The second method is to divide the population of the previous year’s customers into 
three strata according to the type of car each customer bought and perform stratified 
sampling with proportional allocation of sample sizes. That is, if Ni, N2, N3 denote 
the sizes of the three strata, take simple random samples of approximate sizes (due 
to round-off) ny = n(N,/N), no = n(No/N), nz = n(N3/N), respectively, from 
each of the three strata. Stratified sampling assures that the sample representation 
of the three strata equals their population representation. 


8. It is not a simple random sample because products from facility B have a smaller 
chance to be selected than products from facility A. 


9. No, because the method excludes samples consisting of n, cars from the first shift 
and nz = 9 — n; from the second shift for any (ni, 2) different from (6, 3). 


1.4 Random Variables and Statistical Populations 


1. (a) The variable of interest is the number of scratches in each plate. The statistical 
population consists of 500 numbers, 190 zeros, 160 ones, and 150 twos. 


(b) The variable of interest is quantitative. 
(c) The variable of interest is univariate. 
2. (a) Statistical population: If there are N undergraduate students enrolled at PSU, 
the statistical population is a list of length N and the i-th element in the list is 


the major of the 7-th student. The variable of interest is qualitative. Another 
possible variable: gender. 
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(b) Statistical population: If there are N restaurants on campus, the statistical 
population consists of a list of N numbers, and the 7-th element is the capacity 
of the i-th restaurant. The variable of interest is quantitative. Another possible 
variable: food type. 


(c) Statistical population: If there are N books in Penn State libraries, the sta- 
tistical population consists of a list of N numbers, and the 7-th element is the 
check-out frequency of the 7-th book in the library. The variable of interest is 
quantitative. Another possible variable: pages of the book. 


(d) Statistical population: If there are N steel cylinders made in the given month, 
the population consists of a list of N numbers, and the i-th element is the 
diameter of the i-th steel cylinder made in the given month. The variable of 
interest is quantitative. Another possible variable: weight. 


(a) The variable of interest is univariate. 


(b) The variable of interest is quantitative. 


(c) If N is the number cars of available for inspection, the statistical population 
consists of N numbers, {v1,--- ,vv}, where v; is the total number of engine 
and transmission nonconformances of the ith car. 


(d) If the number of nonconformances in the engine and transmission are recorded 
separately for each car, the new variable would be bivariate. 


(a) The variable of interest is the degree of staleness. Statistical population consists 
of a list of 175 numbers, and the 7-th number is the degree of staleness of the 
air in the 7-th domestic flight. 


(b) The variable of interest is quantitative. 

(c) The variable of interest is univariate. 

(a) The variable of interest is the type of car a customer bought and his/her 
satisfaction level. Statistical population: If there are N customers who bought 
a new car in the previous year, the statistical population is a list of N elements, 


and the i-th element is the car type the i-th customer bought along with his/her 
satisfaction level, which is a number between 1 to 6. 


(b) The variable of interest is bivariate. 


(c) The variable of interest has two components. The first is qualitative and the 
second is quantitative. 


1.5 Basic Graphics for Data Visualization 


1. The histogram produced by the commands is shown as following: 
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Histogram of Str 
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0.00 


The stem and leaf plot is as following: 
The decimal point is at the | 

41/5 

42 | 39 

43 | 1445788 
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45 | 1446 

46 | 00246 

AT | 3577 
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49 | 3 


2. The histogram on the waiting time is as following 


Copyright ©) 2016 Pearson Education, Inc. 


6 Chapter 1 Basic Statistical Concepts 


Histogram of waiting 
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The corresponding stem and leaf plot is given below. It is clear that the shape of 
the stem and leaf plot is similar to that of the histogram. 


The decimal point is 1 digit(s) to the right of the | 
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The histogram with title and the colored smooth curve superimposed is shown as 


Copyright ©) 2016 Pearson Education, Inc. 


1.5 Basic Graphics for Data Visualization 7 


Waiting times before Eruption the Old Faithful Geyser 


0.03 
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3. The scatterplot is shown below. From the scatter plot, it seems that if the waiting 
time before eruption is longer, the duration is also longer. 
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4. (a) The scatterplot matrix is given below. From the figure, it seems that the lati- 
tude is a better predictor of the temperature because as the latitude changes, 
the temperature shows a clear pattern, while there is no pattern as the longi- 
tude changes. 
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(b) The following figure gives the 3D scatter plot. The 3D scatter 
that the latitude is a better predictor for the temperature. 


JanTemp 
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5. The 3D scatterplot is shown below 
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6. The scatterplot is shown below. From the scatterplot, it is clear that in general, if 
the speed is high, the breaking distance is larger. 


oO 
a 4 ) 
a 
oO 
Ss 
= 
8 
° 
co I fo) 2 
ad ) 
° 5 
2 ) 
2 0 
° e 8 re) 
° 
) 
re) ° 
o 4 00? 
6 5 O° ° 
00 re) 
o° 8000 
gS 4 2 3 ° 
ieee) 
2 ° 
) ° 
ro) 
ele) 
T T T T T 
5 10 15 20 25 
speed 


7. The required graph is given below: 
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8. The resulting graph is given below. The figure shows that for SMaple and WOak, 
the growing speed in terms of the diameter of the tree is constant, while for ShHick- 
ory, when the tree gets older, it grows faster. 
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9. (a) The basic histogram with smooth curve superimposed: 
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Histogram of t1 
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(b) The stem and leaf plot for the reaction time of Robot 1 is given below. The 
decimal point is at the | 


28 | 4 

29 | 0133688 
30 | 03388 
31 | 0234669 
32 | 47 


10. The produced basic scatter plot is given below. It seems that the surface conduc- 
tivity can be used for predicting sediment conductivity. 
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11. The produced basic scatter plot is given below. It seems that the rainfall volume 
is useful for predicting the runoff volume. 
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12. The produced scatterplot matrix is as following, and it seems that the variable 
temperature is a better single predictor for the amount of electricity consumed. 
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13. The produced scatterplot matrix is as following 


150 


20 


10 


5 6 7 8 9 


According to the scatterplot matrix, we can answer the questions as 


(a) Yes. 
(b) No. 


(c) When there is increased solar radiation, the ozone level is more likely to in- 
crease, but the variability also increases. 


(d) August. 
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14. The produced scatterplot matrix is as following 
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The produced scatterplot matrix is as following 
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From these figures, it seems that the variables auxin and kinetin as not good pre- 
dictors for the callus wight. 
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15. The produced 3D scatterplot is given below: 


35 
jO 


30 


50( 


mpg 

25 
P95 5 
| 0 
————— 


15 20 
NO 
So 
oO 
So 
oO 
disp 


10 
° 


wt 


The resulting 3D scatterplot after replacing bor=T by box=F: 
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We can clearly see the difference. 
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16. The produced bar graph is shown below 
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The produced pie graph follows 
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17. (a) The produced bar graph is shown below 
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The produced pie graph follows 
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(b) The produced figure is shown below 
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1.6 Proportions, Averages, and Variances 


iE; 


. p= 4/14 = 0.286. It estimates the proportion of time when the ozone level is below 


2950. 


They estimate the proportion of all concrete cylinders, constructed by the specifi- 
cations listed, whose 28-day compressive-strength measure is no more than 44, and 
at least 47, respectively. 

(alo =087T,a° = 0.769. 

(b) S = 0.949, S? = 0.9. 


. After repeating the commands five times, we obtain the five pairs of (Z, S') as (3.28, 


0.90), (3.34, 0.64), (3.52, 0.50), (3.38, 0.73), and (3.32, 0.83). 


(a) pp = 0.92, o = 0.8207, o? = 0.6736. 


Copyright ©) 2016 Pearson Education, Inc. 


1.6 Proportions, Averages, and Variances 21 


Ly, 


ha 


12 


(b) Z =0.91, S = 0.8177, S? = 0.6686. 


(a) After running the commands five times, we obtain the results as (0.44, 0.31, 
0.25), (0.27, 0.33, 0.40), (0.38, 0.33, 0.29), (0.34, 0.38, 0.28), and (0.39, 0.30, 
0.31). Each of the results gives an estimation of the population proportions, 
for example, the first gives the estimated proportions of 0, 1 and 2 are 0.44, 
0.31, and 0.25, respectively. 


(b) After running the commands five times, we obtain the results as (0.87, 0.62, 
0.79), (0.94, 0.62, 0.79), (1.06, 0.66, 0.81), (1.09, 0.65, 0.81), and (0.94, 0.70, 
0.84). Each of the above results gives the estimated values of ju, 07, and o. 


(a) jtx =3.5,0% = 2.92. 


(b) After running the commands five times, we obtain the following results (3.52, 
2.64), (3.49, 2.70), (3.43, 3.03), (3.37, 3.10), (3.74, 3.00). We can observe that 
the sample mean and sample variance approximate the population mean and 
population variance reasonably well. 


(c) After running the commands five times, we obtain the following sample pro- 
portions: (0.12; 0.20, 0.13, -0.25, 0.12, 0.18), (0.19, 0.17, 0.17, 0.21, 0.17, 0.09), 
(0.20, 0.11, 0.17, 0.17, 0.17, 0.18), (0.14, 0.14, 0.19, 0.18, 0.19, 0.16), and (0.13, 
0.28, 0.12, 0.18, 0.14, 0.15). They are reasonably close to 1/6. 


8) oF =0.25. 
b) S$? = 0,52 = 0.5, $2 = 0.5, 9? =0. 
c) B(Y) = (0+0.5+0.5+0)/4 = 0.25. 


d) We can see that 0% = E(Y). If the sample variances in part (b) were computed 
according to a formula that divides by n instead of n — 1, E(Y) would have 
been 0.125. 


( 
( 
( 
( 


(a) Y= 30, Lo = 50): 
(b) S? = 0.465, 2 = 46.5. 


(c) There is more uniformity among cars of type A (smaller variability in achieved 
gas mileage), so type A cars are of better quality. 


(a) For the mean value: 


= ath = YG + U;) _ Na +n, Vi 
a N ~ N “See 


For the variance: 


N N N 
ee a (Wi = jas)? _ yaa Or (irr fin)? * Dai (Ui _ yy are, 


. N N N ‘i 
Consequently, o, = ,/o2, = /o2 = ov. 
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(b) For the mean value: 


= i Wi pee C24i yea Ui 
N 


Pe N 


For the variance: 


N N N 
ge ini (Wi _ i 7 y= (CY: — Cofly))? _ CG ini (Vi = Hg)” a? 
= — — Soc. 


° N N N 


Consequently, oy = ./a2, = \/c3o2 = |calov. 
(c) Let u; = cau; fori = 1,2,--- , N. Then we have w; = c,+u,; fori = 1,2,--- , N. 
From part (b), we have 


= Coby. 


iyo. © =e, t= leloy 
From part (a), 


2 2_ 2-92 
w= 4 >= AO. 6. =o =e... 64 = 64= |Gley: 


13. (a) For the mean value: 


n n n 
- Dye Da Ce NCL es _ 
y= t=1 2 i=1 = a 1 =) 42. 
n n n 


For the variance: 
g2 — il = y)” _ ar el p= (Cit m))° - ye _ i) ~ §? 


oo n—-1 a n—-1 _ n—-1 


Consequently, 5, = 4/52 = 4/52 = &3. 


y 
(b) For the mean value: 


gm Deiat Bi _ Liter Ooi _ 2 Does BH 
n n nr 


= C92. 


For the variance: 


g2 — yak _ a)" _ ye oe _ C2é))* _ CG pees _ x)? ~ 292 
y 


n-1 n-1 n-1 ie 
Consequently, Sy = \/S? = \/c3S? = |c2|Se- 
(c) Let u; = cox; for i = 1,2,--- ,n. Then we have y; = c; + u; for i= 1,2,--- ,n. 


From part (b), we have 


= = o2_ 22 
=f, 8, =—G0,; oo — lols. 
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14. Let x;, 7 = 1,--- ,7, be the temperature expressed in Celsius scale, and let yj, 
i=1,--- ,7, be the temperature expressed in Fahrenheit scale. Then y; = 1.82;+32. 
From the given information, 7 = 31 and S, = 1.5. By the results in 13 (c), we have 


j = 18% +32 = 1.8 x 314+32=87.8, S,=|1.8|S, =18x 1.5 =2.7. 


15. Let the coded data be y; = (a; — 81.2997) x 10000, for 7 = 1,--- ,7. Thus, by the 
result of 13 (c), $2 = 10000753 = 10°.S?2. Therefore, $2 = 10-*.S? = 6.833 x 10~". 


16. (a) The estimated population mean is Z = 192.8 and the estimated population 
variance is S? = 312.31 


(b) Let y; be the second-year salary, for i = 1,--- ,15. 


(i) Since y; = 25+ 5, § = Z+5 = 197.8 and S* = S2 = 312.31. 
(ii) Since y; = 1.052;, 7 = 1.05% = 202.44 and S? = 1.057S2 = 344.33. 


1.7 Medians, Percentiles, and Boxplots 


1. (a) The sample median is = 717, the 25th percentile is gq; = (691+699)/2 = 695, 
and the 75th percentile is g3 = (734 + 734)/2 = 734. 


(b) The sample interquartile range is IQR = q3 — q, = 734 — 695 = 39. 


(c) The sample percentile is 100 x (19 — 0.5)/40 = 46.25. 


2. (a) The sample median is = 30.55, the 25th percentile is q; = 29.59, and the 
7oth percentile is g3 = 31.41. 


(b) The sample interquartile range is ]QR = q3 — q, = 31.41 — 29.59 = 1.82. 


(c) The sample percentile is 100 x (19 — 0.5)/22 = 84.09. 


3. (a) After running the code, we obtain the results as x) = 28.97, q = 29.30, 
% = 29.94, q3 = 30.82, and a(n) = 32.23 . 
(b) The 90th percentile is 31.068. 


(c) The boxplot is shown as follows 
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29.5 
1 


29.0 
| 


Clearly, there are no outliers. 


4. (a) The boxplot is shown as follows 


(b) The 30th, 60th, and 90th sample percentiles are 700.7, 720.8, and 746.0 , 
respectively. 
1.8 Comparative Studies 


1. (a) The experimental units are the batches of cake. 


(b) The factors are baking time and temperature. 
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(c) The levels for baking time are 25 and 30 minutes, and the levels for temperature 
are 275°F, 300°F, and 325°F. 


(d) All the treatments are (25, 275), (25, 300), (25, 325), (30, 275), (30,300), (30, 
325). 


e) The response variable is qualitative. 


a) There are three populations involved in this study. 

b) True 

c) False 

d) In this study, each of the three watering regimens is considered as a treatment. 
e) With the changes in the study: 


( 
( 
( 
( 
( 
( 


(i) This will change the number of populations. 


(ii) Watering regimen with levels W,, W2, W3, and location with levels Ly, Lz, Ls. 
The treatments are all (W;,L;) where i = 1,2,3 and j = 1,2,3.. 


3. (a) Let pp = (witp2t+p3t+ ust ps)/5, then the contrasts that represent the effects 
of each area are a; = py — ps, fori =1,--- ,5. 


(b) The contrast is (41 + fe)/2 — (3 + pa + ps) /3. 


4. The comparative boxplot is shown as follows 


Control Seeded 


The comparative boxplot shows that, in general, the seeded clouds could produce 
more rainfall than the unseeded clouds. 


5. The three control versus treatment contrasts are M2 — {1, 3 — fi, and fl4 — py. 
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6. (a) There are four populations involved in this study. 
(b) In this study, each of the four new types of paint is considered as a treatment. 


(c) The three control versus treatment contrasts are [2 — [41, 3 — fl, and [4 — fA. 


7. (a) This will change the number of populations. 


(b) Paint type with levels 7),--- ,74, and location with levels L,,--- ,L4. The 
treatments are all (J;,L;) where i = 1,--- ,4 and j = 1,--- ,4. 


8. The comparative boxplot is given as follows, and it shows that the type B material, 
on average, has higher ignition time than type A material. 


12 


= pt 


' 
' 
' 
' 
' 
' 
———e 
o 4 
7 —_ 
' 
' 
' 
' 
' 
' 
' 
' 
' 
——— 1 ' 
' 


10 


9. The comparative boxplot is given below, and it shows that the male bears, on 
average, are heavier than female bears, but the weights of male bears are spread in 
a wider range. 
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10. The comparative bar graph is shown in the following figure. The reason “weather” 
is the one with the biggest difference between the two cities for being late to work. 


100 
| 


@ Boston 
@ Buffalo 


25 4 
20 4 
15 4 
10 4 
7 - 
0 


11. (a) The comparative bar graph for online and catalog volumes of sale is as follows 


Traffic 
ChildCare 
PubTransp 
Weather 
Overslept 
Other 
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H@ Online 
H Catalog 


40 4 
30 
20 - 
10 4 
0 


(a) The stacked bar graph for online and catalog volumes of sale is as follows 


H Catalog 
r H Online 


(c) The comparative bar graph is better for comparing the volume of sales for 
online and catalog, while the stacked bar graph is better for showing variation 
in the total volume of sales. 


January 
February 
March 
April 

June 

July 
August 
September 


100 5 


80 - 


60 - 


40 + 


20 7 


oO 
L 


July 


i 
< 


January 
February 
March 
June 
August 
September 


12. The watering and location effects will be confounded. The three watering regimens 
should be employed in each location. The root systems in each location should be 
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13. 


14. 


LB; 


16. 


assigned randomly to a watering regimen. 


The paints and location effects will be confounded. The four types of new paint 
should be used in each location. The road segments should be assigned randomly 
to a new type of paint. 

(a) There are four populations in this study. 
(b) True 

(c) False 
( 


d) The factor fertilization has two levels, Fy, and F2, and the factor watering has 
two levels, W, and W5. 


(ec) True 
(a) Of 2590 male applicants, about 1192 were admitted. Similarly, of the 1835 


female applicants, about 557 were admitted. Thus, the admission rates for 
men and women are 0.46 and 0.30, respectively. 


(b) Yes 


(c) No, because the major specific admission rates are higher for women for most 
majors. 


(a) This is not an additive design because the Pygmalion effect is stronger for 
female recruits. 


(b) Here, f.. = (8+ 13+12+4 10)/4 = 10.75. Therefore, the main gender effects 
are 
Op = fir — fi. = (8+ 13)/2 — 10.75 = —0.25 


and 
am = jim. — li, = (10 + 12)/2 — 10.75 = 0.25. 


The main Pygmalion effects are 
Bo = fu.c — fi... = (8+ 10)/2 — 10.75 = —1.75 


and 
Bp = fi.p — fi, = (18 + 12)/2 — 10.75 = 1.75. 


(c) The interaction effects are computed as following: 
Yro = rc — (f.. tar + Bo) = 8 — (10.75 — 0.25 — 1.75) = —0.75, 


Yrp = rp — (fi. tart Bp) = 13 — (10.75 — 0.25 + 1.75) = 0.75, 
uc = lac — (ft. + om + Bc) = 10 — (10.75 + 0.25 — 1.75) = 0.75, 
Yup = Lup — (i2.. + Qy + Bp) = 12- (10.75 + 0.25 + L753) = —0.75. 
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17. (a) Omitted 


(b) The interaction plot shows that the traces are not parallel; therefore, there is 
interaction between pH and temperature. 


(c) f., = (108 + 103 + 101 + 100 + 111 + 104 + 100 + 98)/8 = 103.125. Therefore, 
the main PH effects are 


ay = fr. — fi. = (108 + 103 + 101 + 100) /4 — 103.125 = —0.125 
and 

Qrr = fir, — f.. = (111 + 104 + 100 + 98) /4 — 103.125 = 0.125. 
The main temperature effects are 


Ba =fi.a — fi, = (108 + 111)/2 — 103.125 = 6.375, 


Bp = fi.p — f., = (103 + 104) /2 — 103.125 = 0.375, 
Bo = fic — fu, = (101 + 100) /2 — 103.125 = —2.625, 


and 
Bp = fi.p — ft, = (100 + 98)/2 — 103.125 = —4.125. 


(d) The interaction effects are computed as following: 
YA = bra — (fi... + ar + Ba) = 108 — (103.125 + (—0.125) + 6.375) = —1.375, 


Vip = pip — (jt.. + ar + Ba) = 103 — (103.125 + (—0.125) + 0.375) = —0.375, 
Yio = tro — (f.. tar + Bo) = 101 — (103.125 + (—0.125) + (—2.625)) = 0.625, 
Vp = brp — (f., +ar+ 8p) = 100 — (103.125 + (—0.125) + (—4.125)) = 1.125, 
Vira = bora — (ft.. + gz + Ba) = 111 — (103.125 + 0.125 + 6.375) = 1.375, 
Vie = wire — (fi. tar + Bg) = 104 — (103.125 + 0.125 + 0.375) = 0.375, 
Yio = bio — (f.. + ar + Bo) = 100 — (103.125 + 0.125 + (—2.625)) = —0.625, 


and 


Vir = Miro — (fi. tars + Bp) = 98 — (103.125 + 0.125 + (—4.125)) = —1.125. 


18. (a) The R codes are as following: 
SMT=read.table(“SpruceMothTrap.trt”, header=T) 
mcem=tapply(SMT$Moth, SMT/,c(1, 3)/, mean) 
alphas=rowMeans(mcm)-mean(mcm) 
betas=colMeans(mcm)-mean(mcm) 
gammas=t(t(mem-mean(mcm)-alphas) -betas) 

The computed matrix of cell means is 
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Lure 
Location Chemical Scent Sugar 
ground 26.57143 24.28571 28.14286 
lower 42.71429 38.57143 37.57143 
middle 37.28571 34.28571 41.57143 
top 30.42857 29.14286 32.57143 


The computed main effects for ground, lower, middle, and top are -7.261905, 
6.023810, 4.119048, and -2.880952, respectively, while the computed main ef- 
fects for Chemical, Scent, and Sugar are 0.6547619, -2.0238095, and 1.3690476, 
respectively. 

The computed interaction effects are 


Lure 
Location Chemical Scent Sugar 
ground = -0.4166667 -0.02380952 0.4404762 
lower 2.4404762 0.97619048  -3.4166667 
middle _-1.0833333 -1.40476190 2.4880952 
top -0.9404762 0.45238095 0.4880952 


(b) The R commands for the interaction plot are shown as following 
attach(SMT) # so variables can be referred to by name 
interaction. plot(Lure, Location, Moth, col=c(1,2,3,4), Ity = 1, tlab= “Lure”, 
ylab= “Cell Means of Moth Traps”, trace.label= “Location” ) 
The interaction plot is given in the following figure. According to this figure, 
there are interactive effects. 


Location 


middle 
lower 
top 
ground 


Cell Means of Moth Traps 


Chemical Scent Sugar 
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19. (a) The R codes are as following: 
ALN=read.table(“AdLocNews.tat”, header=T) 
mcem=tapply(ALN$Inquiries, ALN/,c(1, 3)/, mean) 
alphas=rowMeans(mcm)-mean(mcm) 
betas=colMeans(mcm)-mean(mcm) 
gammas=t(t(mem-mean(mcm)-alphas) -betas) 


The computed matrix of cell means is 


Section 
Day Business News _ Sports 
Friday 12.00 15.50 14.25 


Monday 14.50 11.25 =—7.50 
Thursday 15 7.25 9.00 
Tuesday 8 Bas 13.25 9.50 
Wednesday 11.50 I225 | OS 


The computed main effects for Friday, Monday, Thursday, Tuesday, and Wednes- 
day are, 2.58, -0.25, -2.33, 0.17, and -0.17, respectively; while the computed 
main effects for Business, News, and Sports are 0.77, 0.57, and -1.33, respec- 
tively. 


The computed interaction effects are 


Section 
Day Business News Sports 
Friday -2.6833333  1.0166667  1.66666667 
Monday 2.6500000 -0.4000000  -2.25000000 
Thursday 0.9833333  -2.3166667 = 1.33333333 
Tuesday = -0.5166667 = 1.1833333 -0.66666667 
Wednesday -0.4333333  0.5166667  -0.08333333 


The overall best day to put a newspaper ad is on Friday, and the overall best 
newspaper section is Business. 


(b) The interaction plot with the levels of the factor day being traced: 
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The interaction plot with the levels of the factor section being traced: 


Section 


—— News 
— Business 
—— Sports 
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Friday Monday Thursday Tuesday Wednesday 


Day 


These plots show that there are interaction effect between the factor day and 
section, and the (Friday, News) combination has the most inquiries. 


Copyright ©) 2016 Pearson Education, Inc. 


https://t.me/universitarios_info https://www.jamarana.com https://t.me/universitarios 


39 


Chapter 2 
Introduction to Probability 


2.2 Sample Spaces, Events, and Set Operations 


1. (a) The sample space is {(1, 1); ( 2), aay: ee 6), Ree ee EP (6, 1), (6, 2), ees (6, 6)}- 
(b) The sample space is {2,3,4,--- , 12}. 
(c) The sample space is {0,1,2,--- , 6}. 
(d) The sample space is {1, 2,3,--- }. 

2. (a) The Venn diagram is shown as 


El E2 


(b) The Venn diagram is shown as 


El E2 
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(c)The Venn diagram is shown as 


3. (a) The events are represented as 
(i) TAM 
(ii) T?A Me 
(iii) (TN M*°)U (TSN M) 


(b) The Venn diagrams for part (a) are shown as 


T M 
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4. Both of the Venn diagrams should be similar to 


5. (a) A® = {a|x > 75}, the component will last at least 75 time units. 


(b) ANB = {2|53 < x < 75}, the component will last more than 53 units but less 
than 75 time units. 


(c) AUB =S, the sample space. 
(d) (A— B)U(B-— A) = {a|x > 75 or x < 53}, the component will last either at 
most 53 or at least 75 time units. 


6. Both of the Venn diagrams should be similar to 
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7. Both of the Venn diagrams should be similar to 


A B 


8. (a) Prove that (A — B)U(B-— A) = (AUB) —- (ANB): 


creE(A-B)U(B-A)SxreEeA-—BorreB-A 
@&xeEAbutr_gBorxrEeBbutrc¢gA 
«x €Aorz€ B but not in both 
@#xeEAUBandr¢gANB 
=x €(AUB)—-(ANB). 
(b) Prove that (AN B)* = A°U B*: 
rE(ANB)Ysxr¢g ANB 

#xeEA-—-Borxe B-—Aorze (AUB) 

(ce A-—Borzxe (AUB) or |x Ee B—Aorze (AUB) 

@&reBorxre A ezreAusB’. 


(c) Prove that (AN B) UC = (AUC)N(BUC): 


zrE(ANB)UCSzrEeANBorrzec 
@&xée€éAandxre Borred 
(ze AorzeC)and|zreBorreCc| 
@xEeAUCandzre BUC 
@&xreE(AUC)N(BUC). 


9. (a) The sample space is S = {(21,%2,%3,%4,%5)|"; = 5.3,5.4,5.5,5.6,5.7,7 = 
1,2,3,4,5}. The size of the sample space is 5° = 3125. 

(b) The sample space is the collection of the distinct averages, (#7, + %2 + 73 + 
t4+45)/5, formed from the elements of S. The R commands s=c(65.3,5.4,5.5, 
5:6,9-7); 50> eqpand.grid(t1=$.92 =3,09=s,. 74 =s.07)=4)* 
length(table(rowSums(Sa))) return 21 for the size of the sample space of the 
averages. 
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10. (a) The number of disks in EF) is 5+16 = 21, the number of disks in £2 is 5+9 = 14, 
and the number of disks in E3 is 5+ 16 +9 = 30. 


(b) Both of the Venn diagrams should be similar to 


(c) FO E> is the event that “the disk has low hardness and low shock absorption,” 
EF, U Eg is the event that “the disk has low hardness or low shock absorption,” 
E, — E> is the event that “the disk has low hardness but does not have low 
shock absorption,” and (FE, — F2) U (£2 — F)) is the event that “the disk has 
low hardness or low shock absorption but does not have low hardness and low 
shock absorption at the same time.” 


(d) The number of disks in £; FE, is 5, the number of disks in FE U Fy is 30, the 
number of disks in £— 4 is 16, and the number of disks in (£— Ey)U(£2—E}) 
is 25. 


2.3. Experiments with Equally Likely Outcomes 


1. P(E,) = 0.5, P(Ey) = 0.5, P(E, M Ey) = 0.3, P(E, U Ey) = 0.7, P(E — Ey) = 0.2, 
P((Ey — Ey) U (Eg — &,)) = 0.4. 


2. (a) If we select two wafers with replacement, then 


(i) The sample space for the experiment that records the doping type is {(n- 
type, n-type), (n-type, p-type), (p-type, n-type), (p-type, p-type)} and 
the corresponding probabilities are 0.25, 0.25, 0.25, and 0.25. 
(ii) The sample space for the experiment that records the number of n-type 
wafers is {0, 1, 2} and the corresponding probabilities are 0.25, 0.50, and 
0.25. 
(b) If we select four wafers with replacement, then 


(i) The sample space for the experiment that records the doping type is all of 
the 4-component vectors, with each element being n-type or p-type. The 
size of the sample space can be found by the R commands 


Copyright © 2016 Pearson Education, Inc. 


40 Chapter 2 Introduction to Probability 


G=ezpand.grid(W1=0:1,W2=0:1, W3=0:1,W4=0:1); length(G$W1) 
and the result is 16. The probability of each outcome is 1/16. 
(ii) The sample space for the experiment that records the number of n-type 
wafer is {0, 1, 2, 3, 4}. The PMF is given by 
x 0 1 2 3 4 
p(x) 0.0625 0.2500 0.3750 0.2500 0.0625 


(iii) The probability of at most one n-type wafer is 0.0625+0.25 = 0.3125. 


3. &) = {6.8,6.9, 7.0, 7.1} and Hy = {6.9, 7.0,7.1, 7.2}. Thus, P(A) = P( 2) = 4/5. 
FE, E, = {6.9, 7.0,7.1} and P(E, FE.) = 3/5. Ey UE, =S and P(E, U Ey) = 1. 
E, — Ey = {6.8} and P(E, — FE.) = 1/5. Finally, (£, — £2) U (£2 — £;) = {6.8, 7.2}, 
sO P(E, = E2) U (E2 — E,)) = 2/5. 


4. (a) If the water PH level is measured over the next two irrigations, then 


(i) The sample space is S = {(x1, 2%) : 21 = 6.8, 6.9, 7.0, 7.1, 7.2, and r2 = 
6.8, 6.9, 7.0, 7.1, 7.2}. The size of the sample space is 25. 

(ii) The sample space of the experiment that records the average of the two 
PH measurements is S' = {6.8, 6.85, 6.9, 6.95, 7, 7.05, 7.1, 7.15, 7.2} and the 
PMF is 

a 68 685 69° 695 7 05 41 7.15 72 
p(x) 0.04 0.08 0.12 0.16 0.20 0.16 0.12 0.08 0.04 


(b) The probability mass function of the experiment that records the average of 


the PH measurements taken over the next five irrigations is 


x 6.8 6.82 6.84 6.86 6.88 6.9 6.92 
p(x) 0.00032 0.00160 0.00480 0.01120 0.02240 0.03872 0.05920 


x 6.94 6.96 6.98 7 7.02 7.04 7.06 
p(z) 0.08160 0.10240 0.11680 0.12192 0.11680 0.10240 0.08160 
x 7.08 real ¥12 7.14 7.16 Te t2 


p(x) 0.05920 0.03872 0.02240 0.01120 0.00480 0.00160 0.00032 


5. (a) The R command is sample(0:2, size =10, replace=T, prob=pr) and the follow- 
ing gives one possible result: 1, 0, 0, 1, 1, 0, 1, 0, 0, 0. 


(b) The relative frequency based on 10,000 replications is 


0 | Z 
0.6897 0.2799 0.0304 


(c) The histogram of the relative frequencies and line graph of the probability mass 
function is given on the next page. 
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Histogram of x 


Density 


0.1 


0 
| 


This figure shows that all relative frequencies are good approximations to 
corresponding probabilities and we have empirical confirmation of the limiting 
relative frequency interpretation of probability. 

6. (a) The number of ways to finish the test is 2° = 32. 


(b) The sample space for the experiment that records the test score is S = 
{0, 1,2, 3,4, 5}. 


(c) The PMF of X is given by 


x 0 1 2 3 4 5 
p(x) 0.03125 0.15625 0.31250 0.31250 0.15625 0.03125 


4 
= 24. 
1 


26° x 10° 


7. The number of assignments is 


8. The probability is 


9. (a) The number of possible committees is 


12 
= 495. 
() 
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(b) The number of committees consisting of 2 biologists, 1 chemist, and 1 physicist 


()()G) = 


(c) The probability is 120/495 = 0.2424. 


10. (a) The number of possible selections is 


(*) = 252. 
5 


(b) The number of divisions of the 10 players into two teams of 5 is 252/2 = 126. 
(c) The number of handshakes is 

12\ _ 66 

9 | = 96. 


11. (a) In order to go from the lower left corner to the upper right corner, we need to 
totally move 8 steps, with 4 steps to the right and 4 steps upwards. Thus, the 
total number of paths is 

8 
= 70. 
() 


(b) We decompose the move as two stages: stage 1 is from lower left corner to 
circled point, which needs 5 steps with 3 steps to the right and 2 steps upwards; 
stage 2 is from the circled point to the upper right corner, which needs 3 steps 
with 1 step to the right and 2 steps upwards. Thus, the total number of paths 


passing the circled point is 
5\ {3 
= 30), 
ay \1 


(c) The probability is 30/70 = 3/7. 


12. (a) In order to keep the system working, the nonfunctioning antennas cannot be 
next to each other. There are 8 antennas functioning; thus, the 5 nonfunction- 
ing antennas must be in the 9 spaces created by the 8 functioning antennas. 
The number of arrangements is 

9 
= 126. 
(;) 


(b) The total number of the 5 nonfunctioning antennas is ("?) = 1287. Thus, the 
required probability is 126/1287 = 0.0979. 
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13. (a) The total number of selections is 


15 
= 3003. 
(5) 


(b) The number of selections containing three defective buses is 


(2) (2) = 


(c) The asked probability is 220/3003 = 0.07326. 
(d) The probability all five buses are free of the defect is calculated as 


(‘;) 
(5) 
14. (a) The number of samples of size five is | = 142506. 


(b) The number of samples that include two of the six tagged moose is (§) (*) = 
30360. 


(c) 
(i) The probability is 
(2)() _ 30360 


= = e213. 
(~) 142506 
(ii) The probability is 
24 
30360 
(s) = = 0.298. 
(=) 142506 
15. (a) The probability is 
48 _s 
5 
(b) The probability is 
4) (4 
4 
() () = 0.00061. 
(5) 
(c) The probability is 
4) (12) 42 
4 
(ta) U) = 0.0016 
(5) 
16. The total number of possible assignments is ins si) = 113400. 


17. (a) There are 3!° = 14348907 ways to classify the next 15 shingles in tow three 
grades. 
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(b) The number of ways to classify into three high, five medium and seven low 


grades is 
he 360360 
ny an 


(c) The probability is 360360/14348907 = 0.0251. 


18. (a) ; 
2" =(141)"= » (;) Fyn — oy 
(b) 
(a? +0) = (Gary + (FJ cartve + (Ferro? + (3) (aos 
“(ow 
= b* + 4ab? + 6a*b? + 4a° + a 
19. 


Ye 
=) 
eo 
oe 
iw) 
cr) 
bo 
oe 
Q 
ie 
as 
oO 
ow 
ise) 
S 
aS) 
cr) 
bo 
im 
(=) 
Q 
ww 
+ 
ZS 
= 
INS) 


VEE 
ie 
Q 
Ee bo 
Seca 

So =) 
ee 
iw) 
i=) 
two 
nS" 
w 


WS 
j=) 
—“~s 
iw) 
a 
bo 
WS 
a 
= 
| 
T 


3 
0,2,1 


3 
1 
io ; Gen 
: (¢ 5,2) (a8)*an)°ah + (5) Ca)" aaa 
+ (15 9) a Caalras + (, 6 ,)(H)?Cea)a 


ew 


+ ‘(a1 5 )(ab)*2aa)'al + (,,§ 9) (f(a) 


6azas + 12a%a3 + 8a + 3a7a3 + L208 
+ 120702 + 3afaz + Gajae + a°. 


2.4 Axioms and Properties of Probabilities 


P(AN B) = P(A) + P(B) — P(AU B) = 0.37 + 0.23 — 0.47 = 0.13. 


2. (a) P(A,) =-+:= P(Am) = 1/m. 
(b) If m = 8, P(A,UA,UA3U.Aq) = P(A1)+P(Ag) + P(A3) + P(As) = 4x 1/m = 
1/2. 


3. (a) The R commands are 
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= ¢(50,51,52,58); G=expand.grid(X1=1,X2=t,XG=t); attach(G) 
table((X14+X2+X3)/3)/length(X1) 


The resulting PMF is 


Os 50 50.33 50.67 ol 51.33 
p(x) 0.015625 0.046875 0.093750 0.156250 0.187500 
us 51.67 52 52.33 52.67 53 


p(x) 0.187500 0.156250 0.093750 0.046875 0.015625 


(b) The probability that the average gas mileage is at least 52 MPG is 0.156250 + 
0.093750 + 0.046875 + 0.015625 = 0.3125. 


(a) 


(i) Ey = {5,6,7,8, 9,10, 11,12}. P(E.) = 4/36 +5/36+ 6/36 + 5/36 +4/36 
3/36 + 2/36 + 1/36 = 5/6. 
(ii) E, = {2,3,4,5,6,7,8}. P(E) = 1/36 + 2/36 + 3/36 
6/36 + 5/36 = 13/18. 
(iii) EB = Ey U Ey —_ 42, act 2h P(E3) = 1. Ey = Fy — Ex» = {9, 10, 11, 12}, 
P(E4) = 4/36+3/36+2/36+1/36 = 5/18. Es = ESN ES = 0, P(Es) = 0. 
(b) P(E3) = P(E, UE) = P(E,)+ P(E) — P(E, E2) = 30/36 + 26/36 — (4/36+ 
5/36 + 6/36 + 5/36) = 1. 
(c) P(Es) = P(E¢N ES) = P((E, U E2)°) = P(E§) =1— P(E3) =1—-1=0. 


(a) 


4/36 + 5/36 


= 


i) Ey = {(> 3,V), (< 3,V)}, P(E) = 0.25 + 0.3 = 0.55. 

ii) E, = {(< 3,V), (<3, D), (<3, F)}, P(E.) = 0.340.154 0.13 = 0.58. 

iii) FE; = {(> 3, D), (<3, D)}, P(E3) = 0.1+0.15 = 0.25. 

iv) Be = {(>-3,V (= 3,V),t< 3,)),(< 3.F) fp Plo) = 0.254038 + 
0.15--Q13= 0.83. Beed(>3,V)(<3,V),(< 3,2) (<3, F), (3,0) 
P(Es) = 0.25 + 0.3 + 0.15 + 0.13 + 0.1 = 0.93. 

(b) P(E.) = P(E,\UE) = P(E,)+ P(E) — P(E, NE) = 0.55+0.58 — 0.3 = 0.83. 


(c) P(Es) = P(E,UE,UE3) = P(B,)+ P(E2)+P(E3) — P(E, Ey) — P(E, NE3)— 
P(E. 0 E3) + P(E,M EyN E3) = 0.55 + 0.58 + 0.25 — 0.3 — 0 — 0.15 +0 = 0.93. 


( 
( 
( 
( 


(a) The probability that, in any given hour, only machine A produces a batch with 
no defects is 


P(E, N ES) = P(E,) — P(E, 0 Ez) = 0.95 — 0.88 = 0.07. 


(b) The probability, in that any given hour, only machine B produces a batch with 
no defects is 


P(E) ES) = P(E) — P(E, M Ez) = 0.92 — 0.88 = 0.04. 
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(c) The probability that exactly one machine produces a batch with no defects is 
P((£, 9 £5) U (fo 9 Ey)) = P(N ES) + P( £2 Ef) = 0.07 + 0.04 = 0.11. 
(d) The probability that at least one machine produces a batch with no defects is 


P(E, U Ey) = P(E) + P(E) — P(E, 9 E2) = 0.95 + 0.92 — 0.88 = 0.99. 


7. The probability that at least one of the machines will produce a batch with no 
defectives is 


P(E, U Ey U E3) = P(E,) + P(E2) + P(E3) — P(E, Ep) — P(E, Es) 
— P(E. £3) + P(E, £2 Es) 
=(0,95-- 0.92 4. 0.9 = 0.88 = 1.87 = 0.85-- 0.82 = 0.99. 


8. (a) 
(i) P(£,) = 0.10 + 0.04 + 0.02 + 0.08 + 0.30 + 0.06 = 0.6. 
(ii) P(Z.) = 0.10 + 0.08 + 0.06 + 0.04 + 0.30 + 0.14 = 0.72. 
(iii) P(E, MN Eg) = 0.14 0.04 + 0.08 + 0.3 = 0.52. 
(b) The probability mass function for the experiment that records only the online 


monthly volume of sales category is given as 


Online Sales 0 1 2 
Probability 0.16 0.44 0.4 


9. Let 
E4 = {at least two of the original four components work}, 


and 


Es = {at least three of the original four components work} 
U {two of the original four components work 


and the additional component works}. 
Then E, ¢ Es because 


B ={exactly two of the original four components work 


and the additional component does not work}, 


which is part of £4, is not in Es. Thus, E, ¢ Es and, hence, it is not necessarily 
true that P(E£4) < P(Es). 


10. (a) If two dice are rolled, there are a total of 36 possibilities, among which 6 are 
tied. Hence, the probability of tie is 6/36 = 1/6. 
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(b) By symmetry of the game P(A wins) = P(B wins) and P(A wins)+P(B wins)+ 
P(tie) = 1. Using the result of (a), we can solve that P(A wins) = P(B wins) = 
5/12. 


11. (a) A> B= {die A results in 4}, B > C = {die C results in 2}, 
C > D = {die C results in 6, or die C results in 2 and die D results in 1}, 
D> A= {die D results in 5, or die D results in 1 and die A results in O}. 


(b) P(A > B) =4/6, P(B > C) = 4/6, P(C > D) =4/6, P(D > A) =4/6. 
12. (a) If the participant sticks with the original choice, the probability of winning the 
big prize is 1/3. 


(b) If the participant chooses to switch his/her choice, the probability of winning 
the big prize is 2/3. This is because that if the first choice was actually the 
minor prize, then, after switching, he/she will win the big prize. If the first 
choice was actually the big prize, after switching he/she will win the minor 
prize. While the first choice being the minor prize has a probability of 2/3, 
consequently, switching leads to a probability of 2/3 to win the big prize. 


13. To prove that P(@) = 0, let E, = S and F; = @ for 1 = 2,3,---. Then Fy, Fa,--- 
is a sequence of disjoint events. By Axiom 3, we have 


P(S) =P (U s = 5: P(E;) = P(S) + > P(0), 


which implies that 7°, P(@) = 0, and we must have P(@) = 0. 


To prove (2) of Proposition 2.4-1, let E; = 0 fori = n+1,n+2,---. Then Fj, E,--- 
is a sequence of disjoint events and U*, E; = Uj_, Ei. By Axiom 3, we have 


P (3 =P (Us) => P(A) = S> P(E) + s PUR) 
= OPE) + OPO =D PY, 


which is what to be proved. 


2.5 Conditional Probability 


1. The probability can be calculated as 


P((>3)N(>2)) P(> 3) (+3) 
PS 3 Salis P= 2) = PG 2) = (+2) = IG, 
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2. Let B = {system re-evaluation occurs} and C = {a component is individually replaced}. 
Consider a new experiment with reduced sample space A = BUC. The desired 
probability is the probability of B in this new experiment, which is calculated as 


P(BIA) = P(BN A) P(B) 0.005 


= ~ 0.048. 
P(A) ~ P(B)+ PQ) 000+01 7 7° 


3. (a) P(A) = 0.132 + 0.068 = 0.2. 
(b) P(AN B) = 0.132, thus P(B|A) = P(AN B)/P(A) = 0.132/0.2 = 0.66. 
(c) P(X =1) =0.2, P(X =2) =0.3, P(X =3) =0.5. 


4. We let B,, By, Bo be the event that the TV is brand 1, brand 2, and other brand, 
respectively. Let R be the event that the TV needs warranty repair. 


(a) PLB: A) = PCB) PARR) =H 05 x 0.1 = 0.06, 
(b) The tree diagram is 


0.1 R 
Bi 0.9 
R¢ 
0.5 
0.2 R 
0.3 Bo sa 
Re 
0.2 
0.25 R 
Bo 0.75 
Re 


(c) Using the diagram 


P(R) = P(B,)P(R|B,) + P(B2)P(R| Bz) + P(Bo)P(R|Bo) 
= 0.5 x 0.1+0.3 x 0.2 + 0.2 x 0.25 = 0.16. 


5. (a) The probability is 0.36 x 0.58 = 0.2088. 
(b) The tree diagram is 
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0.2 Lease 
US 
0.42 ae 


Buy 
Lease 
Car 0.58 0.35 
0.36 Imported 0.65 
Buy 
0.2 Lease 
64 0.7 Us ee 
Buy 
Truck 0.3 Lease 
0.3 
Imported 0.65 
Buy 


(c) By using the tree diagram, the probability that the next consumer will lease 
his/her vehicle is 


0.36 x 0.42 x 0.2 +. 0.36 x 0.58 x 0.35 + 0.64 x 0.7 x 0.2 + 0.64 x 0.3 x 0.35 = 0.26. 


6. (a) P(no defect M A) = P(no defect|A) P(A) = 0.99 x 0.3 = 0.297. 


(b)P(no defectNB) = P(no defect|B)P(B) = 0.97x0.3 = 0.291, and P(no defectn 
C) = P(no defect|C’) P(C) = 0.92 x 0.3 = 0.276. 


(c) P(no defect) = P(no defect N A) + P(no defect N B) + P(no defect NC’) = 
0.297 + 0.291 + 0.276 = 0.864. 


(d) P(C|no defect) = P(no defect N C)/P(no defect) = 0.276/0.864 = 0.3194. 


art Survive 
c 
0.15 0.04 = 


7. (a) The tree diagram is 


(b) From the given information, we have 
P(survive) = 0.15 x 0.96 + 0.85 x P(Survive|Not C-section) = 0.98. 
Solving this equation gives us P(Survive|Not C-section) = 0.984. 


8. Let B be the event that the credit card holds monthly balance, then P(B) = 0.7 
and P(B°) = 0.3. Let L be the event that the card holder has annual income less 
than $20,000, then P(L|B) = 0.3 and P(L|B°) = 0.2. 


Copyright ©) 2016 Pearson Education, Inc. 


50 


Chapter 2 Introduction to Probability 


10; 


11. 


12. 


13. 


(a) P(L) = P(L|B)P(B) + P(L|B°) P(B°) = 0.3 x 0.7 +0.2 x 0.3 = 0.27. 
(b) P(B|L) = P(L|B)P(B)/P(L) = 0.3 x 0.7/0.27 = 0.778. 


. Let A be the event that the plant is alive and let W be the roommate waters it. 


Then, from the given information, P(W) = 0.85 and P(W°) = 0.15; P(A|W) = 0.9 
and P(A|W°) = 0.2. 

(a) P(A) = P(A|W)P(W) + P(A|W®)P(W®) = 0.9 x 0.85 + 0.2 x 0.15 = 0.795. 

(b) P(W|A) = P(A|W)P(W)/P(A) = 0.9 x 0.85/0.795 = 0.962. 

Let D, be the event that the first is defective and D», the event that the second is 
defective. 

(a) P(no defective) = P(D{N D§) = P(D§|D{)P(D{) = 6/9 x 7/10 = 0.467. 

(b) X can be 0, 1, or 2. We already calculated P(X = 0) = P(no defective) = 
0.467. P(X = 2) = P(D,N Dy) = P(Ds|D;)P(D,) = 2/9 x 3/10 = 0.067. 
Thus, P(X = 1) =1— P(X =0) — P(X = 2) = 0.466. 

(c) P(Di|X = 1) = P(DiN Ds)/P(X = 1) = P(D3|Di)P(Di)/P(X = 1) = 
7/9 x 0.3/0.466 = 0.5. 


Let Ly, Lo, L3, L4 be the event that the radar traps are operated at the 4 locations, 
then P(L,) = 0.4, P(Le) = 0.3, P(L3) = 0.2, P(L£,) = 0.3. Let S be the person 
speeding to work, then P(S|Z,) = 0.2, P(S|£2) = 0.1, P(S|L3) = 0.5, P(S|L4) = 
0.2. 


(a) P(S) = P(S|L1)P(L1) + P(S|L2)P(L2) + P(S|L3)P(L3) + P(S|L4)P(L4) = 
0.2 x 0.4+0.1 x 0.34 0.5 x 0.2+0.2 x 0.3 = 0.27. 

(b) P(Le|S) = P(S|Lo)P(L2)/P(S) = 0.1 x 0.3/0.27 = 0.11. 
Let D be the event that the aircraft will be discovered, and E be the event that 
it has an emergency locator. From the problem, P(D) = 0.7 and P(D*°) = 0.3; 
PED) = 06 end Pl ED") =0.1. 

(a) PLE D?) =P(2| DP Pe) =0.1 x03 = 0.08. 

(b) P(E) = P(E|D°)P(D°) + P(E|D)P(D) = 0.1 x 0.3 +0.6 x 0.7 = 0.45. 

(c) P(D‘\E) = P(E D°)/P(E) = 0.03/0.45 = 0.067. 


R.HS apg) ee P(E, BoN-++ 0 En-1M En) 
ee See BE) P(E, E) PE ei Bees) 


= P(E, NE, -++N En-12 En) = LHS. 
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2.6 Independent Events 


1. From the given information P(E2) = 2/10 and P(£2|F,) = 2/9, thus P(E.) # 


P(E,|E,). Consequently, E, and F2 are not independent. 


2. We can calculate from the table that P(X = 1) = 0.132 + 0.068 = 0.2 and P(Y = 
1) = 0.132 + 0.24 + 0.33 = 0.702, thus P(X = 1)P(Y = 1) = 0.2 x 0.702 = 
0.1404 4 0.132 = P(X =1,Y =1). Thus, the events [X = 1] and [Y = 1] are not 


independent. 
3. (a) The probability is 0.9'° = 0.349. 


(b) The probability is 0.1 x 0.99 = 0.0387. 
(c) The probability is 10 x 0.1 x 0.99 = 0.387 


4. A total of 8 fuses being inspected means that the first 7 are not defective and the 


8th is defective, thus the probability is calculated as 0.99" x 0.01 = 0.0093. 


5. Assuming the cars assembled on each line are independent, also assume that the 


two lines are independent. We have 


(a) The probability of finding zero nonconformance in the sample from line 1 is 


08° = 0.410. 


(b) The probability of finding zero nonconformance in the sample from line 1 is 


0.9% = (0.729. 
(c) The probability is 0.8* x 0.93 = 0.2986. 


6. Yes. By the given information, P(T|M) = P(T), we see that T and M are inde- 


pendent. Thus, T and F = M° are also independent; that is, P(T|F’) = P(T). 


7. (a) The completed table is given as 


Football | Basketball | Track | Total 

Male 0.3 0.22 OTS: |), 0.65 

Female 0 0.28 OOF. 086 
Total 0.3 0.5 0.2 1 


(b) Let B be the event that the student prefers basketball, then P(F|B) = P(F'n 


B)/P(B) = 0.28/0.5 = 0.56. 
(c) F and B are not independent because P(F|B) = 0.56 # 0.35 = P(F). 


8. We can write 
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10. 


1 


and 
Es = {(1, 4), (2,4), (3, 4), (4,4), (5, 4), (6, 4) }. 


Thus, £, 0 Fy = E,N Es = FoN Bs = {(3,4)}, and £,N E.N Es = {(3,4)}. Hence, 
P(E,) = P(E) = P(E3) = 1/6, and P(E, Ey) = P(E, Ey) = P(E) Bs) = 
1/36, this shows that E,, Ey, 3 are pairwise independent. But P(E, £2 E3) # 
P(Ey)P(E2) P(Es). 


. Since E,, Ey, E3 are independent, we have 


P(E, 0 (By U E3)) = P(E, 9 Bg) U (E19 E3)) = P(E 9 Ea) + P(E, Bs) 
— P(E, Fy E3) 
= P(Ey)P(E2) + P(Ei)P(Es) — P(E1)P(E2)P(Es) 
which proves the independence between FE, and E> U £3. 


Let E,, Eo, E3, E, be the events that components 1, 2, 3, 4 function, respectively, 
then 


P(system functions) = P((£,N E2) U (£3 E4)) = P(E, 9 E2) + P(E3M E4) 
— P(E, 0 Fy N E39 E4) 
= P(E,) P(E) + P(Es)P(Es) — P(E:)P(E2) P(Es) P(E) 
= 2 x 0.97 — 0.9* = 0.9639. 


Let A denote the event that the system functions and A; denote the event that 
component 7 functions, 7 = 1, 2,3,4. In mathematical notations 


U (AUN AgN As N Ag) U (Ar Ag N Ag NO Aa). 


Thus 


P(A) = P(A, N A,N Ag N AQ) + P(A1N AgN A§N Ag) + P(ALN ASN A312 Ag) 
+ P(AUN AgN A3M Ag) + P(ArN Ag N Az MN Ag) 
= 4x 0.9? x 0.1+ 0.94 = 0.9477. 
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Random Variables and Their 
Distributions 


3.2 Describing a Probability Distribution 


1. (a) pi(x) is not a valid probability mass function but p2(x) is. 


(b) To find k, solve the equation 0.2k + 0.3k + 0.4k + 0.2k = 1. The solution is 
=f. 


2. (a) The CDF of X is 


0, ite <0 
Obs. a 0S I 
0.15, ifl<a2<2 

Faj=<03, B2a9 235 
O20. ioe ad 
09, taaig<h 
1 ae ame 


The plot is given on the next page. 
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The CDF plot of X 
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(b) P(1< X <4) =P(X <4)— P(X <1) = F(4) — F(0) = 0.9 — 0.05 = 0.85. 


3. (a) P(Y >2)=1-—P(Y < 2) =1-—0.7 =0.3. The plot is given below. 


The CDF plot of Y 
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(b) The possible values of Y are the jumping points 0, 1, 2, 3, and the probability 
at the jumping point is the jumping size. Thus p(0) = 0.2 — 0 = 0.2, p(1) = 
0.7 — 0.2 = 0.5, p(2) = 0.9 — 0.7 = 0.2, and p(3) = 1—0.9 = 0.1. 
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4. X can assume values 0, 1, 2, and 3. We have the PMF of X as 


p(0) = P(X =0) = Gs) _ 0.292, p(l)=P(X =1)= ()G) _ 0.525, 


(’;) (3) 
p(2) = P(X =2)= ene =0.175, (3) = P(X =3) a 0.008. 


The CDF Pig) = Qi x < 0: if 0-< #-< 1, ee aval Ge tae 
F(a) = p(0)+p(1) = 0.292+0.525 = 0.817; if2 <x < 3, F(x) = p(0)+p(1)+p(2) = 
0.292 + 0.525 +.0.175 = 0.993; if 3 < x, F(x) = p(0) + p(1) + p(2) + (3) = 1. 


5. (a) f(x) is not a valid PDF because for some x € (0,2), f(x) < 0. For example, 
fi (1.9) = —0.58. fo(x) is a valid PDF because it is easy to verify that fo(x) > 0 
and fe folajde = 1, 
(b) 


(i) To find k, we must have [°. f(x)dx = 1, that is a kadx = 1. Solving 
the equation, we have k = 1/18. 
It is clear that Fy(x) = 0 if x < 8and Fy(x) = 1lifx > 10. Forz € [8, 10], 


_ thdt=— | tdt=—?t?|o = 


Using the CDF, 


9.8264 8.6?—64 
P(8.6 < X < 9.8) = Fx(9.8)—Fx(8.6) = 3, —-— — = 0.61388. 


(ii) 


P(8.6<X<98) P(86<X<98 
P(X <9.8|X >8.6) =! SX <98)_ P8B6<X < 9.8) 


P(X > 8.6) 1— P(X < 86) 
P(86<X<98) 0.61333 
= a = 0.8479. 
1 — Fx (8.6) | — 804 


36 


6. X ~ U(0,1) and Y = 3+ 6X, then clearly the sample space for Y is (3, 9). Thus, 
the CDF of Y, Fy(y) = 0 if y < 3 and Fy(y) =1ify > 9. For y € (3,9), we 
calculate Fy(y) as 


Fy) = PW S$) = PB +6X Sy) = P(x s 4") 8 et 


Comparing to the CDF of U(A, B) in Examples 3.2-5, we can see that Fy(y) is the 
CDF of U(3,9), hence Y ~ U(3,9). 
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7. The sample space of Y is (0,00). If y < 0, clearly, the CDF Fy(y) = 0. For y > 0, 
Fy(y) = PY <y) = P(—logX <y) = P(X Ze) 
=l= P(X <2) =1=F,le") =1l—e%: 
The PDF of Y is fy(y) =0 if y < 0 and fy(y) =e for y > 0. 
8. (a) P(0.5 < X <2) = F(1) — F(0.5) = 1/4 — 1/16 = 0.1875. 
(b) Taking derivative to F(x), we find the PDF of X is 


=; USES? 
6 i 
Ha) : otherwise. 


(c) Since Y is in seconds, Y = 60X. Thus, F(y) = 0 for y < 0 and F(y) = 1 for 


4 > 120, for 0 < y <= 120, 
2 


Fly) = PW <y) = POX <y)=P(2< 4) => (4) = yo 
~ 60/ 4\60/ ~ 14400 


Taking derivative to F'(y), we find the PDF of Y is 


Pie or ALS eS 120 
e 0, otherwise. 


9. (a) 
P(X > 10) = P(30/D > 10) = P(D <3) = ~ = 1/9. 


(b) Since D must between 0 and 9, then the sample space of X = 30/D is 
(30/9, 00). We first calculate the CDF of X. Clearly, Fy(x) = 0 if x < 30/9. 


For x > 30/9, we have 


Differentiating F'y(x), we have fx(x) = 0 for x < 10/3, otherwise, fx(x) = 
200/(92°). 


10. (a) The event “no cost” means that X < 24 x 3 = 72. Thus, the probability is 
72 


72 
0).02¢70:02(@-48) Jr — [—e-0.02(=—48)| . 


72 
PX =) = fede | 
48 48 
Sy ge OO = RT. 
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(b) The additional cost is between $400 and $800. This means that it takes 
That is, 


more than 4 days but less than 7 days for the fixture to arrive. 
4x 24< X <7 x 24, or 96 < X < 168. Thus, the corresponding probability 


is 
168 
0).02¢~9:02(@-48) dr — [—e7 0.02248) 168 
: 96 


P(96 < X < 168) = [- fade = [ 


— p—9.02x96 __ e 0-02 168 = 0.1119. 


11. When using a sample size of 100, the resulting histogram superimposed with the 


PDF is given as below. 


Histogram of runif(100) 


415 


Density 


0.0 


0.6 0.8 


0.2 0.4 
runif(100) 


0.0 


When using a sample size of 1000, the resulting histogram superimposed with the 


PDF is given on the next page. 
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Histogram of runif(1000) 
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When using a sample size of 10000, the resulting histogram superimposed with the 
PDF is given as below. 


Histogram of runif(10000) 
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runif(10000) 


When using a sample size of 100000, the resulting histogram superimposed with 
the PDF is given on the next page. 
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Histogram of runif(1e+05) 
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From these figures, we can see that when the sample size is only 100, the histogram 
is not reasonably close to the PDF curve. It seems that if the sample size is larger 
than 1000, the histogram starts to get close to the PDF. 


3.3 Parameters of Probability Distributions 


1. (a) The random variable X can take 0, 1, 2, 3. The PMF can be calculated as 


p(0) = P(X =0) = {se =0.4912,  p(1) = P(X =1) tae = 0.4211, 
p(2) = P(X =2) = aT = 0.0842, (3) = P(X =3) = Br = 0.0035. 


(b) E(X) = y xp(z) = 0 x 0.491+1 x 0.4211 +2 x 0.0842 + 3 x 0.0035 = 0.6, 
Var(X) = E(X?)- E(X)? = 0x 0.4914 1x 0.4211+4 x 0.0842 + 9 x 0.0035 — 
0.62 = 0.429. 


2. (a) E(X)=1x0442x034+3x01+4+4x 0.2 = 2.1, and E(1/X) =1/1x 0.4+ 
1/2 x 0.34+1/3 x 0.14 1/4 x 0.2 = 0.63333. 


(b) We need to compare the expectation of 1000/E(X) and 1000/X: £(1000/E(X)) = 
1000/E(X) = 1000/2.1 = 476.19, while E(1000/X) = 1000E(1/X) = 1000 x 
0.63333 = 633.33. Thus, the player should choose 1000/X. 
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3. (a) The sample space of X is Sx = {0, 400,750, 800, 1150, 1500}. We can find the 
PME as 


p(0) = P(both do not buy TV) 
7x 07 = 0.45, 


p(400) = P(one buys $400 TV, the other one does not buy TV) 
=2~x 0.7 x 0.3 x 0.4 = 0.168, 


p(750) = P(one buys $400 TV, the other one does not buy TV) 
=2x 07 x 03 X06 =0.252, 


p(800) = P(both buy $400 TV) 
= 0.3 x 0.4 x 0.3 x 0.4 = 0.0144, 


p(1150) = P(one buys $400 TV, the other buys $750 TV) 
=2x0.3 x 0.4 x 0.3 x 0.6 = 0.0432, 


and 


p(1500) = P(both buy $750 TV) 
=0.3% 0.6% 0.3 0.6 = 0.03924, 


(b) E(X) = Yi ap(z) = 0 x 0.49 + 400 x 0.168 + 750 x 0.252 + 800 x 0.0144 + 
1150 x 0.0432 + 1500 x 0.0324 = 366. 
Var(X) = 0? x 0.49 + 4002 x 0.168 + 7502 x 0.252 + 8002 x 0.0144 + 1150? x 
0.0432 + 1500? x 0.0324 — 366? = 173, 922. 


4. (a) E(X) =0x0.05+1xX0.1+2x0.15+3x 0.25 +4 x 0.35 +5 x 0.1 = 3.05, and 
E(X?) = 0? x 0.05+1? x 0.142? x 0.15+3? x 0.2544? x 0.35+5? x 0.1 = 11.05, 
thus, Var(X) = E(X?) — E(X)? = 1.7475. 


(b) Let Y be the bonus, then Y = 15000X, thus E(Y) = £(15000X) = 15000 x 
3.05 = 45750, and Var(Y) = Var(15000X) = 15000?Var(X) = 150007 -x 
1.7475 = 393187500. 


5. Use the commands g=function(x) {0.01*r*x*exp(-0.1*x)}; integrate(g,lower=0, up- 
per=Inf) to find E(X), and the result is 20. Similarly, we find F(X?) = 600, thus 
2, = 600 — 20? = 200. 
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6. 


(a) Since T= [+5 and T > 0. Thus, the sample space of T is (5,00). The CDF 
of T is F;(#) = 0 if <5, for? >5 


0 


Differentiating the CDF of T, we get the PDF of T as 


. Ole OES) iff > 5 
felt) = = 


0, otherwise. 


15 ” : oo 
-/ 515 — Hore MME f 10(é — 15)0.1e-° 15) df = 55.1819. 
5 15 


The two integrals are calculated by the commands 
g=function(x){ 5*(15-¢) *0. 1*exp(-0.1*¢+0.5)} 
integrate(g, lower=5, upper=15) 
and 
g=function(x) {10*(2-15) *0.1*exp(-0.1*¢+0.5)} 
integrate(g, lower=15, upper=Inf) 
the company’s plan to delay the work on the project does reduce the expected 


cost. 


(a) To find the median, solve the equation F(j) = 0.5, which is fi?/4 = 0.5, and 
results in fi = 2. The 25th percentile is the solution to F(xo25) = 0.25, 
which is 73 5,/4 = 0.25, and results in x25 = 1; The 75th percentile is the 
solution to F (29.75) = 0.75, which is x2 75/4 = 0.75, and results in 20.75 = VJ3. 
Thus, the IQR is IQR = 10.75 = V0:725. = V3 —1=0.782. 

(b) We can get the PDF f(x) = 7/2 for 0 < x < 2, and otherwise, f(x) = 0. So 


Bix) f afte jar = fo Fae =| 
and 
oka f 2 feydr— E(x = [Dar - atx)? =F 


Thus, ox = V2/3 = 0.4714. 


4/3 = 1.333, 
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8. (a) Since Y = 60X and the sample space for X is (0,3), the sample space for Y is 
(0, 180). Thus, the CDF of Y is Fy(y) =0 if y < 0, Fy(y) =1 if y => 180, for 
y € (0,180), we have 


Fy(y) = P(Y < y) = P(60X < y) = P(X < y/60) = Fx(y/60) 
_ log(1 + y/60) 
> log 4 


Thus, by differentiating Fy (y), we have the pdf of Y is 


1 . 
oe Wory)loga? if 0 < y < 180 
0, otherwise. 


(b) V = h(y), and h(y) = 0 for 0 < y < 120, and h(y) = 200 + 6(y — 120) for 
y > 120. Then we can calculate 


B(V) = B(Kr)) = [hw Foluddy 


(oe) 


180 

1 

= 200 + 6(y — 120))-—_—dy = 77.0686. 
i. ( ( )) (60 + y) log 4 


The integral can be calculated using the following R commands 


g=function(y) (200+6*(y-120))/((60+y) *log(4)) 
integrate(g, lower=120, upper=180) 


oe) 


B(V?) = B(u(Y)) = | h(y)? fy (y)dy 


=O 


180 

1 

= 200 + 6(y — 120))2-——dy = 30859.97. 
[, ( ( )) (60 + y) log 4 


The integral can be calculated using the following R. commands 
g=function(y) (200+6*(y-120))**2/((60+y) *log(4)) 
integrate(g, lower=120, upper=180) 
As a result, 02 = E(V”) — E(V)? = 30859.97 — 77.0686? = 24920.4. 
(c) Let D be the fine expressed in dollars, then D = Y/100. Thus, E(D) = 
E(Y)/100 = 0.7707, and o7, = 02/100? = 2.492. 


9. (a) 
1 


1 1 
BP) = f pio(p)dp = f op’dp = 65 P| = 
0 


é 
oi’ 
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and 


1 1 1 9 
E(P?) = | pf p(p)dp = 7 Op? * dp _ ey pet? : = 
: 0 


hence 


2 __ 2 2 _ 0 z a : 
op = E(P’) — B(P) =553- (an) ~ (0 +2)(0 + 1). 


(b) Clearly, if p< 0, Fp(p) = 0, and if p> 1, Fe(p) = 1. f0<p<1, 


B P 
Fol) = | fo(tat = f at 


(c) Denote the 25th percentile and 75th percentile as po.25 and po.75, respectively. 
Then F’p(po.25) = 0.25 and F’p(po.75) = 0.75, or, co = 0.25 and aoe = 0.75. 
Thus, P0.25 = 0.251/8 and P0.75 = Oe. So IQR = Po.75 — Po0.25 = 0.751/0 — 


0.251/8, 


3.4 Models for Discrete Random Variables 


1. (a) X is Binomial R.V. 


(b) The sample space is Sy = {0,1,--- ,5} and the PMF is p(x) = (°)0.370.7°-* 


for g =,.1,2..8,4, 9: 
(¢) HX) =5 * 03 = 1.5 and Var x) =5 x 0.3 * 0.7 = 1.05. 
(d) 


(i) The probability that there are more than 2 fails, P(X > 2), can be 
calculated by the R command 1-pbinom(2, 5, 0.3), which gives the result 


0.163. 


(ii) Let Y be the cost from failed grafts, then Y = 9X. Thus, E(Y) = 


E(QX) = 9E(X) = 13.5, and Var(Y) = Var(9X) = 81Var(X) = 85.05. 


2. (a) X is Binomial R.V., X ~ Bin(15, 0.3). 
(b) E(X) = 15 x 0.3 =4.5 and Var(X) = 15 x 0.3 x 0.7 = 3.15. 


(c) Use dbinom(6, 15, 0.3) for P(X = 6), which gives 0.147236. Use 1-pbinom(5, 


15, 0.3) for P(X > 6), which gives 0.2783786. 


3. (a) X is Binomial R.V. 


(b) The parameters are n = 20 and p = 0.01. P(refunded) = P(X > 2) and can be 
calculated using R command 1-pbinom(1, 20, 0.01), which gives 0.01685934. 
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A. 


6. 


i 


(a) X is Binomial R.V. with n = 20 and p = 0.5. Thus, E(X) = 10 x 0.5 = 5 and 
Var(X) = 10 x 0.5 x 0.5=255. 


(b) P(X = 5) can be calculated using R command dbinom(5, 10, 0.5), which gives 
0.2461. 


(c) This is P(X <5) = F(5), which can be calculated by the command pbinom(5, 
10, 0.5) and the result is 0.6230. 


(d) Y is the total number of incorrectly answered questions. 

e) P@<Y <5)=P2<10=—X <5)=PS<4 $8) = PX <38)- PAS 
4) = F(8) — F(4). This can be calculated by the command pbinom(8, 10, 
0.5)-pbinom(4, 10, 0.5) and the result is 0.6123. 


(a) X is Binomial R.V. with n = 10 and p = 0.9. 
(b) E(X) = 10 x 0.9 =9 and Var(X) = 10 x 0.9x 0.1 =0.89. 


(c) P(X > 7) =1-— P(X <6), the R command is 1-pbinom(6, 10, 0.9), and the 
result is 0.9872. 


(d) Let Y be the catering cost, then Y = 100+ 10X, thus E(Y) = 100+ 10E(X) = 
190, and Var(Y) = 100Var(X) = 90. 


(a) Let X be the number of guilty votes when the defendant is guilty, then X has 
a Binomial distribution with n = 9 and p = 0.9. In order to convict a guilty 
defendant, we must have X > 4. Thus, the probability of convicting is P(X > 
4) =1—P(X < 4), which could be calculated by 1- pbinom(4,9,0.9) resulting 
in 0.9991. Similarly, the probability of convicting an innocent defendant is / - 
pbinom(4,10,0.1) = 0.00089. Thus, the proportion of all defendants convicted 
is 0.4 x 0.00089 + 0.6 x 0.9991 = 0.5998. 


(b) Let G and J be the defendant is guilty and innocent, respectively and let 
VG and VI be the events that the defendant is voted as guilty and innocent, 
respectively. Then, P(G) = 0.6, P(J) = 0.4 and P(VG|G) = 0.9991 from 
part (a). Let Y be the number of guilty votes when the defendant is innocent. 
Then Y has a Binomial distribution with n = 9 and p = 0.1. Thus, P(VI|J) = 
P(Y <4), which could be calculated by pbinom/(4,9,0.1) resulting in 0.9991. 
Thus, 


P(Correct) = P(VGNG)U (VIN D)) = P(VG|G)P(G) + PVI|DP(U) 
= 0.9991 x 0.6 + 0.9991 x 0.4 = 0.9991. 


(a) The R. V. X follows Negative Binomial distribution with parameter r = 1 and 
gS U.0: 


(b) The sample space is S = {1,2,---}, and p(x) = P(X = 2) = p(1—p)*"t. 
(ce) A(X) =1/p = 3.33, and Var(X) =(1—p)/p" = 7.778. 
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10; 


i 


12. 


13. 


(a) The R. V. X follows Negative Binomial distribution with parameter r = 5 and 
p = 0.05. 


(b) The sample space is S = {5,6,---}, and p(z) = P(X = 2) = (*;')p®(1—p)*. 
(c) The R command is 1-pnbinom(30, 5, 0.05) and it gives 0.971. 


(a) Let X be the number of games needed for team A to win twice, then X has the 
negative binomial distribution with r = 3 and p= 0.6. Team A will win the 
series if X = 3 or X = 4 or X = 5. Thus, the probability can be calculated 
using the command sum(dnbinom/(0:2, 3, 0.6)), which gives 0.6826. 


(b) The probability for a better team to win a best-of-five series is larger. With 
more games played, the better team will win more games. 

(a) The R. V. X follows Negative Binomial distribution with parameter r = 1 and 
p= 0.01. 

(b) The sample space is S = {1,2,---} and p(x) = P(X =z) = p(1—p)*t. 

(¢) BCX) = 1/9 = 100. 


(a) The R. V. Y follows Negative Binomial distribution with parameter r = 5 and 
p= 0.01. 


(b) E(Y) =r/p =500 and Var(X) = r(1 — p)/p* = 49500. 
(a) The R. V. X follows Hypergeometric distribution with parameter n = 5, 
M, =6 and My = 9. N=M,+M,= 15. 
(b) The sample space is Sx = {0,1,2,--- ,5} and the PMF is 
(2) (se) 
pao)=P(L=2) = ae. 
(5) 

(c) The R command is sum(dhyper(2:4, 6, 9, 5)) and it gives 0.7043. 
(d) BX) =ai,/N =5 x 6/15 =2, and 


M, M\N-n _6 6\ 15-5 
Var(X) =n (1- = be | = Oar 
ns) ne ( e) Fo 555 ( 3) io vee 


(a) The R. V. X follows Hypergeometric distribution with parameter n = 5, 
M, =3 and Mz = 17. N = M, + Mp = 20. 


(b) The sample space is Sy = {0,1,2,3} and the PMF is 
CO) 
(3) 


(c) The R command is dhyper(1, 3, 17, 5) and it gives 0.4605. 


p(t) = P(X =2) = 
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(d) E(X) =nM,/N =5 x 3/20 = 0.75, and 
M, M,\N-n 3 3 \ 20-5 
Var(X) =n— (1- = Ce a ee ; 
a aa ( 7] N-1 5a ( =) . 
14. (a) The R. V. X follows Hypergeometric distribution with parameter n = 20, 
M, = 200 and M2 = 800. N = M,; + M2 = 1000. 
(b) The R command is phyper(4, 200, 800, 20) and it gives 0.6301. 
(c) We can use the Binomial distribution with parameter n = 20 and p= M,/N = 
0.2 to approximate the distribution of X. 
(d) The R command is pbinom(4, 20, 0.2) and it gives 0.6296. The result is very 
close to that in part (b). 
15. (a) The R. V. X follows Hypergeometric distribution with parameter n = 50, 
M, = 300 and Mz = 9700. N = M, + Mz = 10000. 
(b) The R command is phyper(3, 300, 9700, 50) and it gives 0.9377. 
(c) We can use the Binomial distribution with parameter n = 50 and p= M,/N = 
0.03 to approximate the distribution of X. 
(d) The R command is pbinom(3, 50, 0.03) and it gives 0.9372. The result is very 
close to that in part (b). 
16. (a) Poisson. 
(b) E(Y) = Var(Y) = \ = 1800 x 0.6 = 1080. 
(c) The R command is 1-ppois(1100, 1080) and it gives 0.2654. 

17. Let X be the number of loads during the next quarter, then X has a Poisson 
distribution with \ = 0.5. We are looking for P(X > 2), the R command 1-ppois(2, 
0.5) gives us 0.0144. 

18. (a) The R. V. X follows Poisson distribution with parameter \ = 1.6 x 3 = 4.8. 

(b) BUX) = Var( xX) = A= 4.8. 
(c) The R command ppois(9, 4.8)-ppois(4, 4.8) gives us 0.4986. 
(d) Y = 5000X, thus E(Y) = 5000E(X) = 24000 and Var(Y) = 5000?Var(X) = 
1.2 x 108. 
19. (a) Var(X1) = 2.6 and Var(X2) = 3.8. 


(b) Let 7, and T, be the event that the article is handled by typesetter 1 and 2, 
respectively. Then 


0 
a AL 


No 
i. éond .P(No error 7;) =e" = =e", 


P(No error|T;) = e 0! 
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20. 


21. 


22, 


Thus 
P(No error) = P(No error|T,)P(T,) + P(No error|T)) P(T9) 
=e? x 0.6 + e 8% x 0.4 = 0.0535. 
(c) By Bayes’ Rule 


P(No error|T))P(T2) _ e~°° x 0.4 
P(No error) 0.0535 


P(T)|No error) = =U1672; 


(a) The R. V. X follows Binomial distribution with parameter n = 1500 and 
p= 0.002. 


(b) The R command is sum(dbinom(4:8, 1500, 0.002)) and it gives 0.3490. 

(c) The distribution of X can be approximated by Poisson with parameter \ = 
np =, 

(d) The R command is sum(dpois(4:8, 3)) and it gives 0.3490. 


(e) The exact probability of no faulty is calculated by dbinom(0, 1500, 0.002), 
which gives 0.0496, and the approximate probability of no faulty is calculated 
by dpois(0, 3), which results in 0.0498. 


(a) The random variable Y has a hypergeometric(300, 9700, 200) distribution, 
which can be approximated by a binomial(200, 0.03) distribution, which can 
be approximated by a Poisson(6) distribution. 


(b) The R command for the exact probability is phyper(10, 300, 9700, 200) which 
gives 0.9615; the R command for binomial approximation is pbinom(10, 200, 
0.03) and the result is 0.9599; the R command for Poisson approximation is 
ppois(10, 6) and the result is 0.9574. The two approximations are quite good. 


(a) Each fish has a small probability of being caught. It makes sense to assume 
that each fish behaves independently, thus we have a large number of Bernoulli 
trials with a small probability of success. As a consequence of Proposition 3.4- 
1, the number of fish caught by an angler is modeled as a Poisson random 
variable. 


(b) The probability of each disabled vehicle being abandoned on 195 is small and it 
makes sense to assume that each vehicle owner behaves independently. Thus 
we have a large number of Bernoulli trials with a small probability of success. 
As a consequence of Proposition 3.4-1, the number of disabled vehicle being 
abandoned on [95 in one year is modeled as a Poisson random variable. 


(c) Each person has a small probability of dialing a wrong telephone number and 
it makes sense to assume that each person behaves independently. Thus, we 
have a large number of Bernoulli trials with a small probability of success. As 
a consequence of Proposition 3.4-1, the number of wrongly dialed number in 
a city in one hour is modeled as a Poisson random variable. 
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(d) Each person has a small probability of living 100 years and it makes sense to 
assume that each person behaves independently. Thus, we have a large number 
of Bernoulli trials with a small probability of success. As a consequence of 
Proposition 3.4-1, the number of people who reach age 100 in a city is modeled 
as a Poisson random variable. 


23. (a) Both of the two events mean that there is one event happened in [0,t] and 
there is no event happened in (t, 1]. 
(b) From Proposition 3.4-2, X (0.6) has a Poisson distribution with Ay = 2 x 0.6 = 
1.2 and X(1) — X(0.6) has a Poisson distribution with Ay = 2 x 0.4 = 0.8. 
Furthermore, X (0.6) and X(1) — X(0.6) are independent, thus 


12° 
= oe = 0.1624. 
(c) 
(i) Both T < t and X(t) = 1 say that the event happened before or at time 


t. 
(ii) The proof uses Proposition 3.4-2 and the results in (i) and Part (a): 


PUPS 0 xO) =1) Pa = ax) =) 
a a a 76 cc) =) a P(X(1)=1) 
PU) = 10 A) =x) = 9) 
7 PX) = 1) 
_ PUX® = Y)PUxX() — X@) = 0) 
P(X(1) =) 

eat QO? 6-1) AG=0)* 

= 1! 0! — ft. 


Ax1)! 
ex GAD" 


3.5 Models for Continuous Random Variables 


1. Let T be the life time. From the problem, we know that 7 has a exponential 
distribution with parameter \ = 1/6. 
(a PP sdAel= Pre aa l=- FA H=1= (le Se" = 0513. 


(b) Var(T) = 1/\? = 36. To find the 95th percentile, we solve the equation 
F(to.9s) = 0.95, or, 1 — e795 = 0.95. to.95 = 17.9744. 


(c) Let T;. be the remaining life time. By the memoryless property of exponential 
distribution, 7). still has exponential distribution with A = 1/6. Thus, 


(i) P(T, > 5) = e-® = 0.4346. 
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2. Let T; be the time, in month, of the first arrival, then T, ~ Exp(1). 


(a) We are looking for P(7/30 < T, < 14/30), by the CDF of exponential distri- 
bution, we have 


14 14 
so ee ae = exp “es —exp | —1 x — |] = 0.1648. 
30 30 30 30 


(b) Let 7} be the remaining waiting time. By the memoryless property of expo- 
nential distribution, Tz ~ Exp(1). Thus, E(7>) =1, and Var(T>) = 1. 


3. From (3.5.3), P(X > s+t|X > s) = P(X >t), we must have 1— P(X > s+1t|X > 
s) =1-—P(X > 1), or, P(X < 54+2t|X > s) = P(X <1t) = Fit). Using the 
expression for F(T) given in (3.5.1), we have P(X <s+t|\X >s)=1-e™. 


4. When using a sample size of 10,000, the resulting histogram superimposed with 


the PDF is given as below. This shows that the histogram provides very close 
approximation to the PDF. 


Histogram of rexp(10000) 


0.0 


rexp(10000) 


When using a sample size of 1000, the resulting histogram superimposed with the 
PDF is given on the next page. The histogram is still reasonably close to the PDF. 
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5. 


Histogram of rexp(1000) 


rexp(1000) 


(a) We could calculate this value by R command gnorm(0.25, 48, 4.5), which gives 
39.9648. 


(b) We could calculate this value by R command gnorm(0.9, 43, 4.5), which gives 
48.76698. 


(c) According to the requirement, 43+ must be the 99.5 percentile, which can be 
calculated using the command gnorm(0.995, 43, 4.5), resulting in 54.59123. 
Thus, ¢ = 1159123: 


(d) The probability of a randomly selected A36 steel having strength less than 43 
is 0.5. Let Y be the number of A36 steels having strength less than 43 among 
15 randomly selected steels, then we are calculating P(Y < 3). This can be 
calculated using the R command sum(dbinom(0:3, 15, 0.5)) and the result is 
ON Tarel2, 


(a) We could calculate this value by R command 1-pnorm(8.64, 8, 0.5), which 
gives 0.100. 

(b) The probability is 0.13 = 0.001. 

(a) We could calculate this value by R command pnorm(9.8, 9, 0.4)-pnorm(8.6, 
9, 0.4), which gives 0.8185946. 


(b) Let Y be the number of acceptable resistors, then Y has the binomial distri- 
bution with n = 4 and probability of success as calculated in part (a). By 
the binomial probability formula, P(Y = 2) = 0.1323. We could also get this 
result by using R command. 
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LU, 


an 


(a) We could calculate this value by R command 1-pnorm(600, 500, 80), which 
gives 0.1056498. 


(b) We could calculate this value by R command gnorm(0.99, 500, 80), which 
gives 686.1078. 


(a) We could calculate this value by R command gnorm(0.1492, 10, 0.03), which 
gives 9.968804. 

(b) We could calculate this value by R command pnorm(10.06, 10, 0.03), which 
gives 0.9772499. 

(a) We could calculate this value by R command pnorm(7.9, 10, 2), which gives 
0.1468591. 

(b) We could calculate this value by R command gnorm(0.3, 10, 2), which gives 
8.951199. 


(a) The resulting normal Q-Q plot is shown below. 


Normal Q-Q Plot 


ae 
= fo} ° 
2 
oO 
n 
2 
= 2 
c 3 
is") 
=) 
o 
2 
e «J 
oe © 
oa 
No 
oO 
° ° 
oO 


Theoretical Quantiles 


The plotted points are not close to the straight line, thus it indicates that the 
data does not come from a normal distribution. 


(b) The resulting normal Q-Q plot is given on the next page. 
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Normal Q-Q Plot 


Sample Quantiles 


Theoretical Quantiles 


The plotted points are not close to the straight line, thus it indicates that the 
data does not come from a normal distribution. 


12. (a) If T ~ log-norm(pyy, om), the CDF is F(t) = 0 for t < 0. For t > 0 


log t — pip 
Fr(t) = PD <1) = P(logT <logt) = 6 (8'— He) 


Oln 


since log T ~ N(Uin, On). This is to be proved. 


(b) The three PDFs are given as 
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0.6 


0.5 


0.4 


Log-Normal PDFs 
0.3 


0.1 


0.0 


=] 
Ny 
Ne 
fo) 
(o°) 
ro) 


The three CDFs are given as 


1.0 


Log-Normal CDFs 
0.4 0.6 0.8 


0.2 


0.0 


o 4 
wy 4 
an 4 
o 
o 
3H 


(c) For log-norm(0,1), we have uw = \/e = 1.6487 and o? = e(e — 1) = 4.6708. For 
log-norm(5,1), we have ps = e?\/e = 244.692 and o? = e'(e — 1) = 102880.6. 
For log-norm(5,2), we have = e’ = 1096.633 and 0? = e(e*—1) = 64457365. 

(d) The R commands log(qlnorm(0.95)) and qnorm(0.95) indeed return the same 
value. The reason is that if T is a log-norm(0,1) random variable, then log T 
is a standard normal random variable. 
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13. (a) The three PDFs are given as 


1.0 


0.8 


Gamma PDFs 


0.4 


0.2 


0.0 


Pee 
eo 
ere! 
| 
nae 
o 
wa 


The three CDFs are given as 


0.6 0.8 


Gamma CDFs 


0.4 


0.2 


0.0 


25 2 
we 
w 4 
a 4 
o 
o 
a4 


(b) For gamma(2,1), 4 = 2 and o? = 2. For gamma(2,2), u = 4 and o? = 8. For 
gamma(3.1), j= 3 and o? = 3. For gamma(3,2),.4=6 and o? = 12; 


(c) The 95th percentile for gamma(2,1), gamma(2,2), gamma(3,1), and gamma(3,2) 
are respectively 4.743865, 2.371932, 6.295794, and 3.147897. 
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75 


14. 


(a) The four PDFs are given as 


1.5 


dweibull(x, 0.5, 1) 
1.0 


0.5 
| 


0.0 


(b) Using the commands, we have pp = 1200 and o%, = 361440000. 
(c) Using the CDF, 


P(20 < T < 30) = Fy(30) — Fp(20) 
= [1 — exp(—(30/10)°*)] — [1 — exp(—(20/10)??)] 
= 0.02931867. 


The R command gives the same answer. 


(d) To find the 95th percentile, the equation is 
Fr(to.os) = 0.95, 1 — exp(—(to.05/10)°") = 0.95, 


the solution is to.95 = 2412.765. The R command gives the same answer. 
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Chapter 4 


Jointly Distributed Random 
Variables 


4.2 Describing Joint Probability Distributions 


hk (GYPURS LY So) Pa FS 8) Pa 9) Se 
PUL S1orY S02) = P(X a9 Y ae PX 2 8y 14 PX =, = 
NPC S38 VSP RKO =P H6 Y= 3) PX 1.7 
3) = 0.79, P(X > 2,Y > 2) = P(X =3,Y =3) =0.09. 
(b) The marginal PMF of X is px(1) = 0.09 + 0.12 + 0.13 = 0.34, px(2) 
0.12 + 0.11 + 0.11 = 0.34, px(3) = 0.13 + 0.10 + 0.09 = 0.32. 


The marginal PMF of Y is py(1) = 0.09 + 0.12 + 0.13 = 0.34, py(2) 
0.12 + 0.11 + 0.10 = 0.33, py(3) = 0.13 + 0.11 + 0.09 = 0.33. 


2. (a) The marginal PMF of X is px(0.0) = 0.388 + 0.009 + 0.003 = 0.4, px (1.0) 
0.485 + 0.010 + 0.005 = 0.5, px(2.0) = 0.090 + 0.006 + 0.004 = 0.1. 
The marginal PMF of Y is py(0) = 0.388 + 0.485 + 0.090 = 0.963, py(1) = 
0.009 + 0.010 + 0.006 = 0.025, py(2) = 0.003 + 0.005 + 0.004 = 0.012. 


(b) (i) The probability that a randomly selected rat has one tumor is P(Y = 1) = 
py(1) = 0.025. 
(ii) The probability that a randomly selected rat has at least one tumor is 
P(Y > 1) = py(1) + py(2) = 0.037. 

(c) (i) This is the conditional probability that P(Y = 0|X = 1.0) = P(X = 

1.0, Y = 0)/P(X = 1.0) = 0.485/0.5 = 0.97. 

(ii) This is the conditional probability that P(Y > 0|X = 1.0) =1—P(Y 
0|X = 1.0) = 0.03. 


3. (a) P(X <10,Y <2) = P(X =8,Y =1.5)+P(X =8,Y =2)+P(X =10,Y= 
1.5) + P(X =10,Y =2) =0.340.12+0.15 + 0.135 = 0.705, P(X <10,Y = 
2) = P(X =8,Y =2)+ P(X =10,Y = 2) =0.12 + 0.135 = 0.255. 


(b) The marginal PMF of X is px(8) = 0.8 + 0.12 + 0.0 = 0.42, px(10) = 0.15 + 
0.135 + 0.025 = 0.31, px (12) = 0.03 + 0.15 + 0.09 = 0.27. 
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The marginal PMF of Y is py(1.5) = 0.8+ 0.15 + 0.03 = 0.48, py(2) = 
0.12 + 0.135 + 0.15 = 0.405, py (2.5) = 0 + 0.025 + 0.09 = 0.115. 


(c) We want to find P(X < 10/Y = 2), using Bayes’ rule and parts (a) and (b), 
P(X <10,Y =2) _ 0.255 


P(X <10/Y =2)= = 
(Xs 10 ) P(Y =2) 0.405 


= 0.6296. 


4. (a) The table for the CDF is 


Yy 

F(x,y)|) 1 2 3 
1 0.09 | 0.21 | 0.34 

x 2 0.21 | 0.44 | 0.68 

3 0.34 | 0.67} 1 

(b) In this problem F(x) = F(z,00) = F(z, 3), so Fx(1) = 0.34, Fx(2) = 0.68, 
Fx (3) = 1. By the same reason, we have Fy (1) = 0.34, Fy(2) = 0.67, Fy(3) = 
1. 


(c) P(X =2,Y =2) = F(2,2)— F(2,1) — F(1,2)+ F(1,1) =0.44—0.21—0.21+ 
0.09 = 0.11, as shown in the table of Exercise 1. 


5. Px,(0) is the sum of values in the first 3 by 3 cells and we have Py,(0) = 0.3. 
Similarly, Py, (1) = 0.3 and Py, (2) = 0.4. 


Px,(0) is the sum of values in the first row, and we have Px,(0) = 0.27. Similarly 
Px,(1) = 0.38, and Px,(2) = 0.35. 


(1 
Px,(1) is the sum of values in the three columns which is “X3 = 1”, and we have 
Px,(1) = 0.29. Similarly Px, (2) = 0.34, and Px,(3) = 0.37. 


6. (a) The sample space is {(%1, 22, £3, %4)|xz; are integers,0 < 21 < 3,0 < a < 
2,0<a3 < 1,4, +22+ 23+ 24 = 4}. 


(b) The joint PMF of X,, X2, and X3 is 


p(X1, 22,23) = P(X, = 21, Xq = Lo, X3 = 13, Xq = 4 — 1 — Lo — Z3) 


(i) Nea) Nas) Oieeoscina) 


= (3) for 241 +2%9+23 <4, 
4 


and p(21,%2,%3) =O ifa+a.+273 > 4. 


7. (a) To find k, we use [f f(x, y)dxdy = 1, thus there is 


o 73 2 
1... 
1= ff F(e,yardy = | / knydyde = f kas (3° — a*)da 
0 x 0 
9 


k; 238 
= =F 2)2 5 12 ee nate 


hence, & = 15/238. 
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(b) The joint CDF of X and Y is 
y vf. 
du = | a“ (y? —u?)du 
0 


xz py TR. 

Fay) = [ / kuv*dudu | =uv® 
0 u 0 3 wu 
ku2y3 


5 x 
-( 5 ->)| = kr*y® /6 — kx? /15. 
0 


8. (a) Let region R= {(z,y)|O0<x<y}N {(z,y)|a+y < 3}, then 


1.5 3-2 LS 
PIX+Y <3)= ff fa,y)dedy= ff 2e-*vayae = f 2e*(e* - 9) de 
R 0 x 0 
1.5 


= | Qe-** dx — 2e-? x 1.5 =1—e7% —3e3% =1—-4e-3. 
0 


(b) The marginal PDF of X is 
fx(2) = f f(x, y)dy = i: Je * "dy = 26 7|=e°9] | =2e * for 2 > 0, 


and fx(z%) =0 fora <0. 
The marginal PDF of Y is 


oo y 
fr(y) = / ode = | 2e-* daz = 2e-*[—e*] |§ = 2e*[1-e*] for y>0, 
—oo 0 


and fy(y) = 0 for y < 0. 


4.3. Conditional Distributions 


1. (a) The marginal PMF of X is px(0) = 0.06 + 0.04 + 0.2 = 0.3, px(1) = 0.08 + 
0.3 + 0.06 = 0.44, px(2) = 0.14 0.14 + 0.02 = 0.26. 
The marginal PMF of Y is py(0) = 0.06+0.08+0.1 = 0.24, py (1) = 0.04+0.34 
0.14 = 0.48, py (2) = 0.2 + 0.06 + 0.02 = 0.28. X and Y are not independent, 
for example P(X =0,Y =0) = 0.06 ¥ px(0)py (0). 

(b) The conditional PMF py|x=o0(y) = P(X =0,Y = y)/px(0). That is, the first 
row of the table divides px(0) , thus pyjx=0(0) = 0.06/0.3 = 0.2, pyjx=0(1) = 
0.133, and py|x—o(2) = 0.667. 

By the same reason, we have 

py|x=1(0) = 0.1818, py|x=1(1) = 0.6818, and py|x=1(2) = 0.1364. 

py|x=2(0) = 0.3846, py|x—2(1) = 0.5385, and py)x—2(2) = 0.0769. 

The conditional PMF of Y depends on the value of X, thus X and Y are not 
independent. 
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(c) E(Y|X = 1) = Opy|x=1(0) + Ipy|x=1(1) + 2py|x=1(2) = 0.9546, and E(Y?|X = 
1) = O*pyjx=1(0) + V?pyjxai(1) + 2?pyjx=1(2) = 1.2274, Thus, Var(Y|X = 
1) = E(V2|X = 1) — E(Y|X = 1)? = 1.2274 — 0.95462 = 0.3161. 


2. (a) The regression function of Y on X is E(Y|X = 0) = Opy|x=0(0) + lpy;x=o(1)+ 
2py|x=0(2) = 1.4667, E(Y|X = 1) = 0.9546 as calculated in 1 (c), E(Y|X = 
2) = Opy|x=2(0) + lpyjx=2(1) + 2pyjx=2(2) = 0.6923. 
(b) By the law of total expectation, E(Y) = E(Y|X = O)px(0) + E(Y|X = 
L)px(1) + E(Y|X = 2)px(2 =) = 1.4667 x 0.3 +.0.9546 x 0.44 +.0.6923 x 0.26 = 
1.04. 


3. (a) The regression function is 


E(Y|X = 8) = Lpprixaal s(¥) = >_ ypxy(8, y)/px(8) 


¥y 
= 1.5 x 0.3/0.42-42 x 012/042-4255 x 0/042 = 1.643, 


E(Y|X = 10) = Luvin 10(y) = Y= ypxy (10, y)/px (10) 
y 
See ee eee iT eee 


and 


E(Y|X = 12) =S0 ypyx=1a(y) = © ypx,y (12, y)/px (12) 


= 1.5 x 0.03/0.27 +2 x 0.15/0.27 + 2.5 x 0.09/0.27 = 2.111. 


(b) By the Law of Total Expectation 


B(Y) = 37 BUY |X = 2)px(2) 


= E(Y|X = 8)px(8) + E(Y|X = 10)px (10) + E(Y |X = 12)px(12) 
= 1.643 x 0.42 + 1.798 x 0.31 + 2.111 x 0.27 = 1.817. 
(c) From part (a), we see that the regression function of Y on X depends on the 


value of X. Thus from part (1) of Proposition 4.3-3, the amount of tip left is 
not independent of the price of the meal. 


4. (a) The conditional PMF of Y given X = 1 can be calculated as 


0 = = 0.97 

Py|x= i( )= aE 05 , 
PX v(1, i) 0.01 

Py|x=1( ) px(1) 05 ; 
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and 


Py|x=1(2) = —= = —— =0.01. 
(b) The regression function is 


E(Y|X = = 0) = L,epricnal =o(y y) = Y= ypxy (0, y)/px (0) 


y 
ee me ee ey cy anc 


E(Y|X = a 1) = Lwerix=aly 1 


= " x 0.97 + 1 x 0.02 + 2 x 0.01 = 0.04, 


and 
E(Y|X = 2) = LePrixaal aly) = S° ypx.y(2,y)/px(2) 
y 
= i x 0.09/0.1+ 1 x 0.006/0.1 + 2 x 0.004/0.1 = 0.14. 


(c) By the Law of Total Expectation 
E(Y) = )0 E(Y|X = 2)px(2) 


= E(Y|X = 0)px(0) + E(Y|X = 1)px(1) + E(Y |X = 2)px(2) 
= 0.0375 x 0.4+0.04 x 0.5+0.14 x 0.1 = 0.049. 


5. (a) The conditional PMF of Y depends on the value of X, thus X and Y are not 
independent. 


(b) The table for the joint PMF with the marginal PMF's is 


y px(2) 
P(x, y) 0 i 
0 0.3726 0.1674 0.54 
1 0.1445 0.0255 0.17 
2 0.2436 0.0464 0.29 
py(y) 0.7607 0.2393 


X and Y are not independent because P(X = 2,Y = y) = px(x)py(y) does not 
hold for all the (x, y) combinations. 


6. (a) The regression function of Y on X is py|x(a) = E(Y|X = x) = Opy;x=2(0) + 
Ipy|x=2(1) = py|x=z(1), thus, we have E(Y|X = 0) = 0.31, E(Y|X = 1) = 
0.15, E(Y|X = 2) =0.16. 
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(b) By the law of total expectation, E(Y) = E(Y|X = O)px(0) + E(Y|X = 
L)px(1)+ E(Y |X = 2)px(2 =) = 0.31 0.54+0.15 x 0.17+0.16 x 0.29 = 0.2393. 


7. (a) E(Y|X =1) = lpyjx—a(1)+2py)x-1(2) = 1x 0.66-+2x 0.34 = 1.34, E(Y?|X = 
1) = Ppyjxai(1) + 2?py)x=1(2) = 1 x 0.664 2 x 0.34 = 2.02, thus Var(Y|X = 
1) = E(¥?|X = 1) — E(V|X = 1)? = 2.02 — 1.34? = 0.2244. 


(b) The table for the joint PMF is 


Yy 
P(z,y) 1 2 
i 0.132 0.068 
x 2 0.24 0.06 
3 0.33 0.17 


(c) This problem is asking for P(Y = 1) = py(1) = 0.132 + 0.24 + 0.33 = 0.702. 


(d) This is asking for P(X = 1|Y = 1) PX Sihy = hy ey Sa) = 
0.132/0.702 = 0.188. 


8. (a) The regression function of Y on X is py|x(#) = E(Y|X = x) = lpyjxae(1) + 
2py|x=2(2). Thus, we have E(Y|X = 0) = 1.34, B(Y|X = 1) =1.2, E(Y|X = 
2) = 1.34. 


(b) By the law of total expectation, E(Y) = E(Y|X = O)px(0) + E(Y|X = 
L)px(1) + E(V|X = 2)px(2 =) = 1.34 x 241.2 x 0.34 1.34 x 0.5 = 1.298. 


9. (a) We know the marginal PMF of X is px(4) = 0.3, px(5) = 0.5, and px (6) = 0.2. 
Thus, 


(—0.8 + 0.04x)4 


and 


1 
~ 14+ (-0.84 0.042 


px vl 2,0) =P UY =0|X =e) px(2) 


yiPx (a). 


Using these formulas, we could get the following table for the joint PMF: 


Y 
Py) 0 1 
4 0.2569 0.0431 
Hy 5 0.4426 0.0574 
6 0.1821 0.0179 


X and Y are not independent because the conditional distribution of Y de- 
pends on X. 
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(b) We have the marginal PMF of Y as py(0) = 0.2569 + 0.4426 + 0.1821 = 0.8816 
and py(1) = 0.1184. Thus, 


E(X|Y = 1) = Yams (x) = D> apx,y(#,1)/py(1) 


eA 
= 4 x 0.0431/0.1184 + 5 x 0.0574/0.1184 + 6 x 0.0179/0.1184 
= 4.7872, 


and 


E(X|Y =0) = Yayo ~o(z) = S¢ xpx.y(,0)/py (0) 


r=4 
= 4 x 0.2569/0.8816 + 5 x 0.4426/0.8816 + 6 x 0.1821/0.8816 
= 4.9152. 


10. (a) By the definition of Binomial distribution, it is clear that given X = 7, Y ~ 
Bin(z, 0.6). The joint PMF can be calculated using the formula 


pxy(a,y) = ("Joo tps(e), withO<y<a2<4. 
According to this formula and the given marginal distribution of X, we have 
pxy (0, 0) = Oi, pxy (1, 0) = 0.08, pxy (1, 1) = 0.12, 
pxy(2,0) = 0.048, pxy(2,1) = 0.144, px y(2, 2) = 0.108 
px,y (3, 0) = 0.016, px y(3, 1) = 0.072, px. y(3, 2) = 0.108, 
pxy (3, 3) = 0.054, px,y (4, 0) = 0.00384, px,y (4, 1) = 0.02304, 
and 


pxy (4, 2) = 0.05184, pxy (4, 3) = 0.05184, px.y (4, 4) = 0.01944. 


(b) Given X = x, Y ~ Bin(z,0.6), thus the regression function E(Y|X = x) = 
0.6z. 


(c) By the law of total expectation, E(Y) = >, E(Y|X = 2x)px(x) =0.6)0, zpx(zx) = 


0.6 x [0x 0.1+1x0.2+2x03+3 x 0.25+4 x 0.15] = 1.29 


11. (a) The marginal PDF of X is 
oo 1 1 
fea) =f fevdy= f (ety dy=o45, for O<2 <1. 
—oo 0 
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Thus, the conditional PDF of Y given X = x is 


fy) __ @t+y 


-_ , f eee Ie ap 1, 
Ng ga = aed 


Hence, 
0.5 0.5 
Lay 
POY 2054 =2) = fyix=e(y)dy = | y 
a3 | os @+1/2 
_ 0.2% + 0.08 
— £+1/2 


(b) By (4.3.16), 


Pils -< ¥ 0a) = P03 < ¥ 205|\X =a) fxlejar 


ee 
1/2)dx = 0.18. 
=i pd /e sR 


oe 


12. (a) Independent 
(b) Not independent 
(c) Not independent 
13. Since T; and 7) are the first two inter-arrival times of a Poisson process X(s), s > 0, 
with rate a, according to Proposition 3.5-1, both 7, and 7) have an exponential 
distribution with PDF f(t) =ae~™, t > 0. To show T| and T> are independent, let 
us consider the event 7) >t given 7, = s. This means that the first arrival occurs 
at s while the second arrival occurs after time t, thus there is no event in (s,s +f]. 


Therefore 
P(T, > t|T, = s) = P(No events in (s,s +14]|T, = s). 


By the third postulate in definition 3.4-1 of a Poisson process, the events “ no event 
in (s,s+t]” and “7; = s” are independent, thus 


P(No events in (s, s+t]|T, = s) = P(No events in (s, s+t]) = P(X(s+t)—X(s) = 0). 


According to part (2) of Proposition 3.4-2, X(s+t) —X(s) has Poisson distribution 
with parameter a(s +t —t) = at, then 


P(X(s +t) — X(s) =0) =e™. 
Combining these arguments, we have 


P(T > t|Ty = Ss) = e. 


Hence, the conditional PDF of T> given T, = s is 
d d 
fry\r,=s(t) = at < t|T) = s) = a P Ue = t|Ty = s) = ae ct = fr, (t), 


and this shows that 7, and 75 are independent. 
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14. (a) T> is the distance from the first pothole to the second one and, according to 
Proposition 3.5-1, T> is exponential(0.16). Since the first pothole found is 8 
miles from the start, the second pothole will be found between 14 and 19 miles 
from the star if and only if 14—8 < 7, < 19—8. Thus, the desired probability 
is P(6 < Ty < 11) = 0.2108 by the command pezrp(11,0.16) - pexp(6,0.16). 


(b) T, and 7 are both exponential (0.16) random variables and the result from 


Exercise 13 shows that T; and T> are independent. Thus, E(72|T, = x) = 
E(T)) = 1/0.16 = 6.25. Hence, the regression function of Y on X is 


E(Y|X =2) = E(T, + TT =2) =2 + E(Th|T, = 2) = 2 + 6.25. 


15. The conditional PDF of X given Y = y is 
Ly 
fxyv=,(z) =-e*”" forz>0, 
y 


and fxjy=y(x) = 0 otherwise. Thus, fx;y=,(x) depends on y. According to Propo- 
sition 4.3-2 (4), X and Y are not independent. 


16. (a) The regression function is 


oe) 


E(Y|X =z) = / yfy|xaa(y)dy = | yoo “dy = -| ze *dz = —I(2) = 


where we used the variable transformation z = xy and the definition of Gamma 
function. 


Clearly, B(Y |X = 5.1) = 1/5.1 = 0.1961. 


(b) By the law of total expectation, 


at at 1 1 1/5 —1/6 
EY) = EY LA = c= = —____.. 
oe [. (F| ) fx (@)dar / xt log 6 —logi5a * log 6 — log 5 


17. (a) The support of the joint PDF is given on the following page. 
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(b) The support of the joint PDF is not a rectangle, thus X and Y are not 
independent. 


(c) We have 


oe) 1-22 
jx(Z)= / jli2,gjay= | 2nay = 24¢(1=22), for 0< %< 0.5, 
—co 0 


and fx(x) = 0 otherwise. 


ore (1-y)/2 , ; 
Pie / Aaa | Madr = 120? |O-? = 3(1 — y)?, 
—oo 0 


for 0 <y <1, and fy(y) =0 otherwise. Thus, 


EOS [ afx(x)de = [ 2407(1 — 2x)dx = [ 312(1 — t)dt 


(oe) 


2!1! 


1 
AL 
and 


2! 1 


BY) =f ufv(ua= f 3u(t - yay = 35 = 5. 


18. (a) The conditional PDF of Y given X = x is 


f(x,y) 24¢ ik 


Iyix=ely) = fx(t) 24e(1—2z) 1 — 22’ 


Copyright © 2016 Pearson Education, Inc. 


4.4 Mean Value of Functions of Random Variables 87 


for 0 <y < 1-22, and fy|x=2(y) = 0 otherwise. The regression function is 


- ea | t=2¢ 1 
(Y|X = 2) [ tox. (y)dy 7 Yr—5,4Y ; 5 


The plot for the regression function is given as follows: 


0.3 0.4 0.5 


regression(x) 


0.2 


0.0 


By the expression, E(Y|X = 0.3) = 0.5 — 0.3 = 0.2. 


(b) By the law of total expectation, 


= 02] — Ws 
E(Y) = E(Y |X =2)fx(x)de= : 24x(1 — 2x)dx 
o : qi! 1 
= | 12¢(1 — 2n)*da = | al =i) d= 3 —_ =, 
0 0 Al 4 


4.4 Mean Value of Functions of Random Variables 
1. The price the person pays is P = min{ X,Y}. So 


E(P) = min(150, 150) x 0.25 + min(150, 135) x 0.05 + min(150, 120) x 0.05 
+ min(135, 150) x 0.05 + min(135, 135) x 0.2 + min(135, 120) x 0.1 
+ min(120, 150) x 0.05 + min(120, 135) x 0.1 + min(120, 120) x 0.15 = 132, 
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and 


E(P*) = min(150, 150)? x 0.25 + min(150, 135)? x 0.05 + min(150, 120)? x 0.05 
+ min(135, 150)? x 0.05 + min(135, 135)? x 0.2 + min(135, 120)? x 0.1 
+ min(120, 150)? x 0.05 + min(120, 135)? x 0.1 + min(120, 120)? x 0.15 
= 175725. 


Therefore Var(P) = E(P?) — E(P)? = 17572.5 — 132? = 148.5. 


2. (a) Let X and Y be the times components A and B fail, respectively, so the system 
fails at time T = max{ X,Y}. Then, the CDF of T is F(t) =0 if t ¢ [0, 1); if 
t € (0, 1], 


Pet) =P <4) = Pomex( X,Y} <7) =PX <TyY <7) 
=P(X <P <1) =f, 


where we used the independence of X and Y, and since X and Y are uni- 
form(0,1) random variables, P(X < t) = P(Y <t) =t. 
Thus, the PDF of T fr(t) = 2t for t € [0,1] and f(t) = 0 otherwise. 


(b) E(T) = ftfr(t)dt = J, 2t2dt = 2/3 and 


Ee) = [etemae= [ oe d= 1/2, 


thus 
Var(T) = E(T’) — E(T)? = 1/2 — (2/3)? = 1/18. 


3. The volume of the cylinder is h(X,Y) = 7Y?X. In Example 4.3-17 it was found 
that E[h(X, Y)] = (13/16)7. We also have 


3 70.75 9 
Eli (%Y)|= // h?(x, y) f (x, y)dady =) | gat fe dude 
0 J05 


372 3 . 0.75 3772 af 1 
= d 2 hip = | et F| lee 
3 / x of y ay 3 [ie q Ea 0.5 


1539 . 
= ——T", 
2048 
Thus, 
1539 13° 
A(X, Y)] = Efh?(X,Y)] — E[h(X, Y)? = ——n? — —pn? = 0.901. 
4. (a) The total waiting time is T= X, + X.+---+X5+%14+Y2+ Y3. 
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(b) The expected value is 


E(T) = E(X1) +--+ + B(Xs) + BM) + B(Y2) + B(¥3) 
=3x5+6x3=33 


and the variance is 


Var(T) = Var(X1) +--+ + Var(X5) + Var(Y1) + Var(Y2) + Var(Y3) 
=2x9+4X3= 22. 


In order to make the calculation valid, we have to assume that the waiting 
times are independent. 


5. (a) Let X be the height of a randomly selected segment, then X is a uniform(35.5, 
36.5) random variable. Thus E£(X) = (35.5 + 36.5)/2 = 36, and Var(X) = 
(35.5 — 36.5)2/12 = 1/12. 


(b) Let H, be the height of tower 1, then Hy, = X,+---+ X39. Thus, E(A,) = 
E(X,) +--+» + E(Xgo) = 36 x 30 = 1080, and Var(H;) = Var(X,) + +--+ 
Var(X30) = 30/12 = 2.5. 


(c) Let Y1,--- , ¥30 be the heights of the segments used in tower 2, and let Hz be 
the height of tower 2, then Hp = Y; + --- + Y39. As in part (b), we can find 
E(H2) = 1080 and Var(H2) = 2.5. Let D be the difference of the heights of 
the two towers, then D = H,— Hy. It makes sense to assume that the concrete 
segments are independent, thus H; and Hp are independent. Then, E(D) = 
E(A,) — E(H2) = 0 and Var(D) = Var( A — Hy) = Var(Ai) + Var(H2) = 5. 


6. The number of injuries in a month is X; + X9+---+ Xy and uw = E(X;) = 
Thus the expected number to injuries in a month is E(X; + Xo +---+ Xw 
E(N)u =7 x 1.5 = 10.5. 


15: 
ye 


7. The total tips is T = X,+---+Xy,+Yit+---+Yy,, and we know that py, = E(X;) = 
20, U2 = E(Y;) = 10, Nj; is Poisson(4), No is Poisson(6). Thus, the expected value 
of the total amount of tips is 


E(T) = E(X1 +-+++ Xny,)+ BM +--+ Yu.) = ECM) pn + E(N2) pe 
=4~x 20+6x 10 = 140. 


8. The marginal PDF of X was derived in Example 4.3-9 as fx (x) = 122(1 — x)? for 
0 <a <1 and zero otherwise. Then, 


E(X) = [ sts(oyax = [ 1277(1 — 2)°*dr = . 
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By the symmetry of the joint PDF, the marginal PDF of Y is the same as that of 
X. It follows that E(Y) = E(X) = 2/5. We calculate 


1 1-2 1 
EXY) = [fut y)dady = ax | 24a? y*dady = s | z?(1—2)°dx 
0 0 0 
oe 
15 
Hence, 
Cue yn mine 
ene "15 55. 75 


9. Cov(X, Y) = Cov(X,9.3+1.5X +) = 1.5Cov(X, X) + Cov(X,€) = 1.50% = 1.5 x 
9 = 13.5, Cov(e, Y) = Cov(e,9.3+1.5X +) = 1.5Cov(e, X) + Cov(e, 6) = o? = 16. 


10. For a randomly selected customer, the total cost of the meal is 7 = X + Y. Thus 
the expected value is 


E(T) = E(X+Y) = dL Le+ yp p(a,y) = (8+ 1.5) x 0.3 + (8 + 2.0) x 0.12 


42(8 +25) x04 (ane +1.5) x 0.15 + (10 + 2.0) x 0.135 + (10 + 2.5) x 0.025 
+ (12 +. 1.5) x 0.03 + (12 + 2.0) x 0.15 + (12 + 2.5) x 0.09 
= 11.5175, 


E(T?) = E[(X +Y)? = 22 z+ y)"p = (8 + 1.5)? x 0.3+ (8+ 2.0)? x 0.12 


+ (8+2.5)? £04 (10-+ 1.5)" x 0.15 + (10+ 2.0)? ¥ 0.135 
+ (10+ 2.5)? x 0.025 + (12 + 1.5)? x 0.03 + (12 + 2.0)? x 0.15 
+ (12+ 2.5)? x 0.09 = 136.0487. 


Therefore, Var(T) = E(T?) — E(T)? = 136.0487 — 11.5175? = 3.3959. 


11. Similar to the previous exercise, 
B(8X +10Y) = SS ° (8x + 10y)p(x, y) = (8 x 0+ 10 x 0) x 0.06 


x 0.2 
x 0.3 
x 0.1 
x 0.02 


+(8x0+10x1 
+(8x1+10x0 
+(8x1+10x2 
+(8x2+10x1 
= 18.08 


x 0.04 + 
x 0.08 + 
x 0.06 + 
x 0.14+ 


8x0+10x 2 
8x1+10x1 
8x 2+10x 0 
8x2+10x2 


Sat No Nr Ra 
LS LN PS 
SS Ra ea Re 
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and 
E[(8X + 10Y)?| p>, 8x + 10y)p(x, y) = (8 x 0+ 10 x 0)? x 0.06 

=f oaduua aioe Gaus ce 
+(8x1+10x 0)? x 0.08+(8x1+10x 1)?x 0.3 
+ (8x 1+10 x 2)? x 0.06 + (8 x 2+10 x 0)? x 0.1 
+ (8x2+10~x 1)? x 0.14+(8 x 2+10 x 2)? x 0.02 
= 379.52. 

Thus, 


Var(8X + 10Y) = E[(8X + 10Y)?] — E(8X + 10Y)? = 379.52 — 18.08? = 52.6336. 


12. The joint PMF is 


Y 
P(z,y) it 2 
1 0.132 0.068 
a 2 0.24 0.06 
3 0.33 0.17 


Thus, 


E(C) = EQVX + 3Y?) =o eve + avn »y) = (2x V14+3x 1’) x 0.132 


Mon ienSy ti ae enc 
+ (2x V¥2+3 x 27) x 0.06 + (2x V3 +3 x 17) x 0.33 
+ (2x V3 +3 x 27) x 0.17 = 8.662579, 


E(C”) = E[(2VX + 3Y’)? Sa ee: p(t, y) = (2x V1 +3 x 1°)? x 0.132 


+ (2x V1+3 x 27)? x 0.068 + (2 x V24+3 x 17)? x 0.24 
+ (2x V2+3 x 27)? x 0.06 + (2 x V3 +3 x 1”)? x 0.33 
+ (2x V3 43 x 27)? x 0.17 = 92.41633. 


Hence, Var(C) = E(C?) — E(C)? = 92.41633 — 8.662579? = 17.37606. 
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13. (a) Since X, Y, and Z are independent uniform(0,1) random variables, we have 
Var(X) = Var(Y) = Var(Z) = 1/12. Thus, 


Var(X1) = Var(X + Z) = Var(X) + Var(Z) = 1/6, 
Var(Yi) = Var(Y + 2Z) = Var(Y) + 4Var(Z) = 5/12, 
and 
Cov(X1, Yi) = Cov(X + Z,Y + 2Z) = Cov(X,Y + 2Z) + Cov(Z, Y + 22) 
= 2Cov(Z,.Z) = 2Var(Z) = 1/6. 

Hence, 

Var(X,+Y;) = Var(X1)-+Var(¥;)+2Cov(X}, Yi) = 1/6+5/12+2x1/6 = 11/12, 
and 


Var(X,—Y1) = Var(X1)+ Var(Y1) —2Cov(X), Y,) = 1/6+5/12—2x1/6 = 1/4. 


(b) Use the following commands: 
set.seed=111; x=runif(10000); y=runif(10000); z=runif(10000); 
tl = +2; yl = yt+2*z; var(zl + y1); var(al - y1) 


The commands give the sample variances of a sample of 10,000 X, + Y; values 
and a sample of 10,000 X, — Y; values. They are 0.923218 and 0.2471397, 
respectively. These values are close to the calculation in part (a). 


14. First we find 


— Re ee, Tea 
Cov(X, Y) = Cov ( ; =%5 S° S$ Cov( Xi, ¥j) 


; =1 j=l 
1 5 
= glCov(en, Y,) + Cov(Xo, Y2) + Cov(X3, Y3)] = ne 
Var(X ) = ¢2,-/3 = 3 and Var(Y ) =o2-/3 = 4/3. Thus, 
oe = 2 — 4 5 23 
Var(X + Y) = Var(X) + Var(Y) + 2Cov(X, Y) =3+ ae 23 = 


15. The hypergeometric random variable X with parameters n, M,, and Mp) can be 
thought of as a sum of n Bernoulli random variables X,, Xo,--- ,X,, each with 
probability of success p = M,/(M, + Mg), ie. X = X1+ Xo+---+X,. Thus, 


nM, 
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16. From the model, X 1, X2,--- ,X; are independent and all of them have geometric 
distribution with success probability p. Thus, E(X;) = 1/p and Var(X;) = (1 - 


p)/p*. Since X = X,+---+ X,, we have 
E(X) = E(Xi) +--+ E(X,) 


and 
Var(X) = Var(X1) +--+ + Var(X;) = P 


4.5 Quantifying Dependence 


1. We have the joint PMF and the marginal PMF as 


y 

y) 0 12 px(z) 
0.06 0.04 0.2 0.3 
0.08 0.3 0.06 0.44 
0.1 0.14 0.02 0.26 


0.24 0.48 0.28 


Pl 


2} 


uy 
0 
i 

2 


py(y) 


Thus, 
E(X)=0x0.3+1x 0.44+2 x 0.26 = 0.96, 
x 0.341? x 0.44 + 2? x 0.26 = 1.48, 


x 0.2441 x 0.484 2 x 0.28 = 1.04, 
x 0.24 + 1? x 0.48 + 2? x 0.28 = 1.6, 


ECO )\=0 
E(Y) =0 

= (? 
o% = E(X*) — E(X)? = 1.48 — 0.967 = 0.5584, 


of = E(Y”) — E(Y)? = 1.6 — 1.04? = 0.5184, 
E(XY) =1x034+1x 2x 0.06+2x1x0.14+2 x 2 x 0.02 = 0.78, 


and 
Cov(X, Y) = E(XY) — E(X)E(Y) = 0.78 — 0.96 x 1.04 = —0.2184. 


Hence, the linear correlation coefficient of X and Y is 
Cov(X, Y —0.2184 
PXY = uCnem = —0.4059. 
Oxoy V0.5584,/0.5184 


2. (a) With more drug administered, we would expect the laboratory rats to develop 
more tumors, thus X and Y are expected to be positively correlated. 


We have the joint PMF and the marginal PMF as 
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7] 
©.) 0 1 2 px (x) 
0 0.388 0.009 0.003 0.4 
1 
2 


P( 


0.485 0.01 0.005 0.5 
0.09 0.006 0.004 0.1 
py(y) 0.963 0.025 0.012 


Thus, 
E(X) =0x044+1x054+2 x 0.1=0.7, 


E(X*) = 0? x 0.4+1? x 0.542? x 0.1 =0.9, 
E(Y) =0 x 0.963 +1 x 0.025 + 2 x 0.012 = 0.049, 


E(Y*) = 0? x 0.963 + 1? x 0.025 + 2? x 0.012 = 0.073, 
oy = E(X*) — E(X)/ =0.9-0.7 =0.41, 
oy = E(Y’) — E(Y)? = 0.073 — 0.049? = 0.070599, 
E(XY) =1x0.01+1 x 2 x 0.005 + 2 x 1 x 0.006 + 2 x 2 x 0.004 = 0.048, 


and 
Cov(X, Y) = E(XY) — E(X)E(Y) = 0.048 — 0.7 x 0.049 = 0.0137. 
The positive covariance shows that X and Y are positively correlated. 
(b) The linear correlation coefficient of X and Y is 


Cov(X,Y 0.0137 
pxy = oes ) = 0.0805. 
oxoy /0.41/0.070599 


3. (a) Using the R commands 
x = c(12.8, 12.9, 12.9, 18.6, 14.5, 14.6, 15.1, 17.5, 19.5, 20.8) 
iy = C150. 02,04, 10-78, Ba 7.1, 100, 10-8, 1120) 
var(x); var(y); cov(x,y); cor(x,y), 
we get S? = 8.268, S? = 3.907, Sxy = 5.46, and rx y = 0.9607. 
(b) If the distances had been given in inches, S%., S?, and Sx, y would be changed 


by a factor of 127, but rx, would be the same. 


4. (a) We would expect the diameter and age to be positively correlated because, 
intuitively, if a tree is older, it generally has bigger diameter. The scatterplot 
of the data is given on the next page and it confirms that the diameter and 
age are positively correlated. 
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(b) Using the command cov(z,y); cor(z,y), we get the sample covariance and linear 
correlation of diameter and age to be 9308.47 and 0.9262236, respectively. On 
the basis of the scatterplot, we can conclude that linear correlation correctly 
captures the strength of the diameter-age dependence. 


5. The commands give the correlation matrix as 


Head.L Head.W Neck.G Chest.G Weight 
Head.L 1.0000000 0.7677513 0.8932822 0.8584959 0.8374185 
Head.W_ 0.7677513 1.0000000 0.8138328 0.8109276 0.8012839 
Neck.G 0.8932822 0.8138328 1.0000000 0.9575036 0.9672750 
Chest.G 0.8584959 0.8109276 0.9575036 1.0000000 0.9599134 
Weight 0.8374185 0.8012839 0.9672750 0.9599134 1.0000000 


It is observed that the variables Neck.G and Chest.G have the largest correla- 
tions with the variable Weight. Thus, we would say that the variables Neck.G and 
Chest.G are the two best single predictors of the variable Weight. 


6. (a) The marginal distribution of X is Bernoulli(0.3). 


(b) If X = 1, that is, the first selection is defective, then there are 2 defective 
and 7 non-defective products left. Thus, Y|X = 1 ~ Bernoulli(2/9). By same 
reason, Y|X = 0 ~ Bernoulli(3/9). 

(joey UH PY HA SPX = 1) S2/9x0.3, pee (0) = FY Hox 
DPA =1)= 79 * 03. bev DH PY HA Rar X aH 3 ee 
and prey (OOH PY H=04 = P(X =0) =6/9 x07. 
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(d) py (1) = pxy (1, 1)+pxy(0,1) = 2/9 x 0.3+3/9 x 0.7 = 0.3, thus the marginal 
distribution of Y is Bernoulli(0.3) , which is the same as X. 


(ec) From the joint distribution of X and Y, we have 


1 P, 
Y)=S 090 cypxy(a,y) = pxy(1,1) = 2/9 x 0.3. 


z=0 y=0 
Thus, 
2 
Cov(X,¥) = E(XY) — E(X)E(Y) = 5 x 0.3 — 0.3 x 0.3 = —0.02333, 


and the linear correlation coefficient is 


rxy(X, ¥) = 


Cov(X,Y) | 2 x 0.3 — 0.3 x 0.3 ll 
JVar(X)Var(Y) /0.3x (1—0.3)x03x (1-03) 9 


7. (a) From the marginal distributions, we calculate 


“ : an! 4 
E(x) = f ofe(e)ae = f 2401 — 2a)de = 3 f ead— t)dt = 37 = = 
0 0 4 
= aie 3311! 3 
B(x?) = f af ajax = f 24x7(1-2x)\dx =5f 0-H =55- = 5, 
e) x() 0 ( ) 2 Jo (It) 2 ol AO 
. , 2! 
E(Y) = | yfy(y)dy = at — y)-dx = 3— ia 
and 7 
gi2! 1 
E(Y*) = fr iewyy= f 3y°(1 — dx = 3 = = 


Thus, 0% = E(X?)—E(X)? = 3/40-1/16 = 1/80, and o? = E(Y?)—-E(Y)*? = 
1/10 — 1/16 = 3/80. Further 


0.5 pl 0.5 
E(XY) = | ovutle y)dady -{ / 242° ydydx = if 1227(1 — 22)?dx 
0 
1 


3 f* 3 DID! 
=—/] #(1-1t)*dt= = — 
[| ( ) 25! 20’ 


therefore, oxy = E(XY) — E(X)E(Y) = 1/20 — 1/4 x 1/4 = —1/80. 


(b) The linear correlation coefficient is 


oxy 1/80  ¥3 


PxYoxoy  /1/80\/3/80 3 
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(c) Given X = 2, we have the conditional PDF of Y is fyxa2(y) = Oif y ¢ 
[0,1 — 22], otherwise 


f(x,y) 24x i 


Iyix-e(y) = fx(z)  24x(1—2z) 1-22 


Therefore, given X = x, Y is uniformly distributed on [0,1 — 2z], hence the 
regression function of Y and X is 
1 


EVIX =2) = 575 


The dependence between X and Y is not linear, thus it is not appropriate to 
use Pxy- 


8. It is clear that f(x) is an even function on [—1,1], thus zf(x) and x?f(zx) are 
odd functions on [—1,1]. Then F(X) = fi, ¢f(a)de = 0, by the same reason 
F(X?) = 0. Hence, 


Cov(X,Y) = E(XY) — E(X)E(Y) = E(X*) — E(X)E(Y) =0. 


Consequently, 
Cov(X, Y) 
pxy = ————_ = 0. 
OxOy 
9. (a) It is seen that f(x) is an even function on [—1, 1], thus xf(x) and x? f(x) are 
odd functions on [—1,1]. Then F(X) = f xf (x)dx = 0, by the same reason 
F(X) = 0. Hence, 


Cov(X,Y) = E(XY) — E(X)E(Y) = E(X?) — E(X)E(Y) =0. 


(b) When the value of X is given as x, the value of Y is known as x”. Thus 
E(Y|X = 2) = x, without any calculation. 

(c) The dependence between X and Y is not linear, thus it is not appropriate to 
use Pxy- 


4.6 Models for Joint Distributions 


1. (a) From the model, we know that given P = p, the distribution of Y is Bin(n, p). 
Thus the joint PMF of P and Y is 


ppy(p,y) = P(Y =y|P = p)P(P =p) = (")ora — p)” *P(P = p). 


In detail, for y = 0,1,--- ,n, 
n 
pry (0.6, y) = 0.2 Joona, 
Y 
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ppy (0.8, y) = 0.5 (") 0.870.2"-¥, 


and 
ppy (0.9, y) = 0.3 (") 0.9%0.1"-¥, 
y 


(b) We have the general formula py(y) = ppy (0.6, y) + ppy (0.8, y) + ppy(0.9, y). 
Thus, when n = 3, 


3 


a ; a 
py(0) = 0.2 (5) 0.6°0.4° + 0.5 (; 0.8°0.2° + 0.3 (5) 0.9°0.13 = 0.0171, 


i 


3 


3 3 
py(2) = 02(5) 0.670.4' +0.5( — }0.870.2' + 0.3 (5 )o.sto. = 0.3513, 


ew 


3 3 3 
py(1) = 0.2(7) 0.6'0.4? + 0.5( Josto2" + 0.3 ({)ost0.1 = ts7, 


3 3 
py(3) = 0.2 (3) 0.6°0.4° + 0.5 ( 0.870.2° + 0.3 (3) 0.930.1° = 0.5179. 


2. (a) From the model, we know that given P = p, the distribution of Y is Bin(n, p). 
Thus the joint density of P and Y is the conditional PMF of Y given P = p 
times the marginal PDF of P, that is, 


beouiSrw ear pie) = (")pra— ny 


(b) The marginal PMF of Y is given by 


py(y) = / fry(p,y)dp = / ("\ora — p)"Ydp = —. 


3. We have Y; ~ N(9.34+1.5 x 20,16) = N (39.3, 16) and Yj ~ N(9.3+1.5 x 25, 16) = 
N(46.8, 16). 


(a) The 95th percentile of Y, can be found by gnorm(0.95, 39.3, 4), which gives 
45.879. 


(b) Since Y; and Y2 are independent, Y2—Y; ~ N(46.8—39.3, 16+16) = N(7.5, 32). 
Thus P(¥2 > Y,) = P(Y2 — Y; > 0), which can be found by the command 1 - 
pnorm(0, 7.5, sqrt(82)), and it gives 0.9076. 


4. (a) From the regression model given, we have E(Y|X = x) = 9.3+ 1.52, thus the 
marginal mean of Y is E(Y) =9.3+ 1.5ux =9.38+1.5 x 24 = 45.3. 
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(b) From the model, we have 6; = 1.5. Thus, by (4.6.8), the covariance of X and 
Y isoxy = Bi0% = 1.5 x 9 = 13.5. The correlation coefficient is 


Oxy 13.5 
ee oe = 0.7474. 
Ee eay 3 Kal 36S 


5. (a) Since the marginal distribution of X is normal, the conditional distribution of 
Y given X = z is also normal, the joint PDF of X and Y is bivariate normal 
distribution with parameters wx = 24, py = 45.3, 0% = 9, o% = 36.25, and 
Oxy = 13.5. 

(b) Use the command pmnorm(c(25, 45), c(24, 45.8), matriz(c(9, 13.5, 18.5, 
36.25), 2)), and it gives the probability as 0.42612. 


6. By the second mean plus error expression in (4.6.6), Y = 69 + 6i(X — px) +. 
Because any (intrinsic) error variable has mean value zero, that is, E(¢) = 0 , we 
have 


E(Y) = E(6ot 6i(X —px)+€) = Bot Bi (E(X)—px)+E(€) = Bot 6i(ux—ux) = Po- 


7. In the exponential regression, the conditional distribution of Y given X = x is given 
as 
fy|x=c = Nae Ay, 
Hence, the regression function of Y on X is 


E(Y|X =2) = _ 


ee 


(a) Given X = zg, 
1 I 1 
E VY xX — —=4 py — 5 
m a A(z) exp(a+ 6x) exp(4.2+ 3.12) 


X has a uniform distribution on (2, 6), thus f(a) = 1/4 on (2, 6) and zero 
otherwise. By the Law of Total Expectation, 


a= [ Berix =a eee / ees mie 


iL 4g! —3.12|6 —6 
ee lk = 2.454222 x 10°°. 


(b) By the analysis above, the joint PDF of X and Y is fx y(a, y) = 0.25\(2)e, 
with 2<2<6and y>0 and fxy(z,y) =0 otherwise. 


8. (a) The marginal distribution of N, is Binomial with parameters n = 16 and 
p = 0.02. Thus, the probability that exactly one of the 16 planned chemical 
reactions will not be performed due to unusable raw materials is P(N, = 1), 
which can be calculated by the command dbinom(1, 16, 0.02), and it gives 
0.2363421. 
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(b) The probability that 10 chemical reactions will be performed with recent 
materials, 4 with moderately aged materials and 2 with aged materials, is 
P(N, = 10, No = 4, N3 = 2, N4 = 0), and it is calculated as 
16 

10,4, 2,0 


(N, = 10, No = 4, N3 = 2, Na = 0) ( ) 0.6"0.340.08%0.02 = 0.03765241. 


(c) The R command is dmultinom(c(10, 4, 2, 0), prob=c(0.6,0.3,0.08,0.02)) and 
it gives exactly the same answer as in part (b). 


(d) The covariance can be calculated as 


Cov(N; + No, N3) = Cov(.Nj, N3) + Cov(.N2, N3) = —npip3 — npep3 
= —16 x 0.6 x 0.08 — 16 x 0.3 x 0.08 = —1.152. 


In general, because we have fixed number of total items, if we have more items 
being aged, we would expect that to have fewer recent and moderately aged 
items. Thus, it is reasonable for the covariance to be negative. 


(e) The random variable N; + No + N3 has Binomial distribution with n = 16 
and p = pi + po + p3 = 0.98. Thus, Var(N; + No + N3) = np(1 — p) = 
16 x 0.98 x 0.02 = 0.3136. 


9. (a) The marginal distribution of N3 is Binomial with parameter n = 15 and 
p = p3 = 0.54. Thus, the probability that exactly 10 children use a child seat 
is P(N3 = 10), which can be calculated by dbinom(10, 15, 0.54), and it gives 
0.1304. 


(b) The probability that exactly 10 children use a child seat and five use a seat belt 
is P(N, = 0, No = 5, Ns = 10), and the R command is dmultinom(c(0,5, 10), 
prob=c(0.17,0.29,0.54)), which gives 0.01298622. 


(c) The random variable Nj + N3 has binomial distribution with n = 15 and 
Pp = po+p3 = 0.83, thus Var(N2+N3) = np(1—p) = 15 x 0.83 x 0.17 = 2.1165, 
and 


Cov(N,, No + N3) = Cov(Ni, No) + Cov(N,, N3) = —Np1Pp2 — NPip3 
= —15 x 0.17 x 0.29 — 15 x 0.17 x 0.54 = —2.1165. 


Alternatively, 


Cov(Ni, No + N3) — Cov(n _ (No + N3), No + N3) = —Cov((N2 + N3), No + N3) 
= —Var(N»2 + N3) = —2.1165. 


10. (a) From (4.6.7), we have of, — 0? = G20%. Combining this with (4.6.8), there is 


2 me 2 2 2 

9 Brox Oy — Oe € 
ge 
Y y y 


that is, 1 — p? = 0?/o}. 
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(b) By Proposition 4.6-1, 69 = uy, and from part (a), 0. = /1—p’oy. Then 
(4.6.3) is 


_ (y — Bo — Bi (4 — px))? (a — px)? 
Prey (®,9) = Qt0.0x _ { Jo? 20% 
7 1 exp | Ue we \ 
27/1 — p2ayox 2(1— p*)oy, 20% 


1 { G-At) # \ 
2toxoy./1 — 2(1—p?)oy 20% 


_ 1 Pusey pie z 
 Woxoys/1 — PP oe - 21— por ar} 
= : ex - v + 28 ty 
Qnoxoy 1 — PP A = p? oy. | ie _ p?)o¥, 
- ties - BOs | 
241 p)oyo% — 2oX(1 — p? oy, 
7 1 z oy ay 
7 2toxoy,/1 — p? ia {3a — p?)oy 7 “ey (1 = p?)oy 
pray @(1 — p*)oy 
~ 21 = p?)o20% 202 (1- mae 


(Plug in the relation 3,0x = poy from Proposition 4.6-3) 


1 Pp pig 


= exp 4 — 
Qroxoy./1 — p? : { 2(1—p?)o2 = =(1-p?)oxoy 


7 Ga 
2(1 — p?)oyo% 


1 { =i pty ¥ |} 
= exp = 
Qtoxoy/1 — p? 1— 9? 20 OxOy 205 


which is (4.6.13), as to be proved. 


(c) In (4.6.14), 
me O Oee \ o% poxoy 
~ \oxy of ~ \ poxoy of : 


since p = ox y/(axoy). It is easy to calculate that the determinant of © is 
|Z] = ox 0} — (poxoy)* = (1— p* oxox, 


and the inverse of © is 


saw tt oY —~Poxoy \ _ i oy —pPpoxdy 
JZ] \ -poxoy =o (l—p?)oxoy \ —poxoy ox . 
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Thus, 


1 Bt ete i 33 a —poxoy 
= — py)=7 = : 
5 (a Lx, y — by) ( y — ply ) 2(1 — p?)oz 0% (7,9) —poxoy ox 
7 (02 & — poxoyy, 0X9 — poxoyt) [ & 
2(1 — p*)oXoy 


1 Sr 
Sa” 


oLd — poxoyGt + 0% 9 


2(1 — p?)oXoy, 
1 7 pry = i? 
1—p? [20% oxoy 2G2|° 


1 1 _ L— px )} 1 
xp, =—-(t—iny= pi ( = 
204/|d| " { x Hx Y Hy) y — by 2toxoy,/1 — p? 


24 ~2 = ~2 
exp | ja - = ale 
1l—p* |20% oxoy 2oy 
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Chapter 5 


Some Approximation Results 


5.2 The LLN and the Consistency of Averages 


1. (a) The proof is 


< = = : 

— ao} (aa)? a 
(b) Using the inequality in part (a), we get the upper bounds of 1, 0.25, and 0.11, 
respectively. To calculate the exact probability, since X ~ N(,07), we have 


P(|X — p| > ao) -p(4a4 >a) = P(|Z| > a) = P(Z < -—a)+ P(Z > a) 
= @(—a) + 1—- 8(a) = 20(-a), 


where Z ~ N(0,1) and ®(-) is the CDF of N(0,1). Using R commands, we 
can calculate the exact value when a = 1, 2, and 3, as 0.3173105, 0.04550026, 
and 0.002699796. The upper bounds are much worse. 


2. (a) By the LLN, X should be approximately equal to the expected life span of a 
randomly selected component, which is 


1 
E(X) = 5 = 1/0.013 = 76.92, 


since the mean of a random variable having exponential distribution with 
parameter 2 is 1/A. 


(b) The problem asks the probability that 
P(\|X — p| < 15.38) = 1 — P(|X — p| > 15.38). 


By Chebyshev’s inequality 


Var(X) a eux = 00 e = 0 25 

15.382 15.38 15.38% 

Thus, P(|X —p| < 15.38) = 1—P(|X — pl > 15.38) > 1—0.25 = 0.75. We can 
say that the probability that X will be within 15.38 units from the population 
mean is at least 0.75. 


P(|X — pl > 15.38) < 
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3. (a) Let X be a Poisson random variable with mean 1, that is \ = 1, then Var(X) = 


1. Then E(X) = A = 1, and Var(X) = A/n = 1/10 = 0.1. Thus, the 
calculation is 


P(0.5 <X < 1,5) = P(-0.5 < X —1<0,5) = P(X —1| < 0.5) 


Var(X ) 


DEF = 06. 


=1-P(|X -1|>05)>1- 


(b) Using the fact that Y = )7}°, X; is a Poisson random variable with mean 10, 
we have 


10 
P(0.5 <X <15)=P(5< 5) X;< 15) =P(Y < 15) — PY < 4) =0.922, 


i=1 


which is calculated by the R command ppois(15, 10)-ppois(4, 10). 


5.3 Convolutions 


1. (a) Given X = k, we have Z = X + Y = k+/YY, thus the sample space of Z is 
{k,k+1,---,k+m}. Hence, for z in the sample space 


PiZSeX =P YSj2—h =f) =P ]s— hk) 
_ ng z—k n2—(z—k) 
= 1- : 
(_ i: A)? (1 — p) 


(b) Since Z = X +Y, the sample space of Z is {1,2,--- ,n1 +2}. Hence, for z 
in the sample space, by the total probability formula 


P(Z=2)= 52 PZ = 2|X =k)P(X = K) 


which shows that Z ~ Bin(n1 + na, p). 
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2. Let Y = X, + Xo, since X, and X2 are both exponential random variables, the 
sample space of Y is (0,00), thus fy(y) = 0 for y < 0. For y > 0, we have 


fry) = | fraly —ar)failer)der = f fx.(y — 01) fx, (@1)da1 
= [ dexp(-Aly ~.21))exp(—Ary)aes = My exp(—A). 


3. (a) By Proposition 5.3-1, the distribution of X, + X_ + X3 is N(fx, + “x, + 
[x,,0x, + 0%, + 0X,) = N(180,36). Thus, P(X, + X2+ X3 > 185) can be 
found by 1 - pnorm(185, 180, 6), which gives 0.2023. 

(b) By Corollary 5.3-1, X ~ N(,07/3) = N(60,4), and X ~ N(2, 02/3) = 
N(65,5). Since X; and Y; are independent for all i and j, X and Y are 
independent. Apply Proposition 5.3-1 again, we have Y—X ~ N(5,9). Thus, 
P(Y — X > 8) can be found by 1 - pnorm(8, 5, 3), which gives 0.1587. 


4. (a) The total duration is X; + X2+ X3 which follows N(~x, + px, + Mx,,0%, + 
ox, + 0x,) = N(21,9), by Proposition 5.3-1. Thus, the 95th percentile of the 
total duration can be found by the command qnorm(0.95, 21, 3), which gives 
25.93 hours. 


(b) This problem is asking for P(X, + X2+ X3 < 25), which can be found by the 
command pnorm(25, 21, 3), which gives 0.9088. 


(c) Let p be the probability that the flashlight will last more than 25 hours, then 
using part (b), p = 1 — 0.9088 = 0.0912. Let Y be the number of trips that 
the batteries will last more than 25 hours, then Y ~ Bin(5,p). The problem 
is asking for P(Y = 3), which can be found by the command dbinom(8, 5, 
0.0912), which gives 0.0063. 


5. In probabilistic notation, we want to determine the sample size n so that P(|X —ju| < 
0.005) = 0.95. According to Corollary 5.3-1, 


X— 4 
o//n 


Thus, we can write the desired probability as 


~ N(0,1). 


7 A= 0.005 0.005 
P(\X —p| < 0.005) = P ay. Pi iZis— S065, 
o//n| — a//n o//n 
where Z ~ N(0,1). Hence, aE = 2.925 = 1.96, solving for n gives us 
1.960 \* 
= | ——  ] = 138.30 
. (=) 


Finally, we choose to use n = 139. 
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5.4 The Central Limit Theorem 


1. (a) In all 10 repetitions, the smallest and largest order statistics are outliers. The 
following lists the results of 10 runs: 


Min. Ist Qu. Median Mean 3rd Qu. Max. 
-2006.0000 -1.2910 -0.0498 -8.2870 0.8515 = 95.2400 
-2635.0000 -0.7402 0.1411 -5.1150 1.0730 = 83.3200 
-322.3000 -0.6833 0.1635 -0.9070 1.2130 = 127.6000 
-509.9000 -0.8157 0.0690 -0.5815 1.1880 267.1000 
-990.6000  -0.9281 0.0867 -1.5660 1.0510 111.4000 
-889.2000 -0.7637 0.1762 -0.5776 0.9609 301.9000 
-88.95000 -1.09000 -0.01795 -0.68860 0.96320  29.37000 
-128.40000 -0.97310 0.08502 0.52190 1.14600 189.80000 
-1191.0000 -1.0610 0.0103 -3.1000 1.0710 = 651.3000 
-384.5000 -0.9337 0.0127 -0.3989 1.0870 130.1000 


(b) The commands are repeated 10 times and the results are as follows: 


Min. Ist Qu. Median Mean 3rd Qu. Max. 

-2913.0000 -1.0700 0.0694 -5.3770 0.9376 440.3000 
-301.4000 -1.1450 -0.0652 0.4325 1.0480 659.3000 
-233.50000 -1.11500 -0.09916 -0.47300 0.89380 118.60000 
-468.2000 -0.9738 0.0045 2.6000 0.8916  1374.0000 
-648.3000 -0.9662 0.0563 -1.9120 0.9866 64.4600 

-1444.0000 -1.1050 -0.0946 = -3.2830 =—-1.0250 ~—-: 102.3000 
-67.04000  -0.95250 -0.02498 0.61270 0.97650 190.80000 
-187.3000 -1.0690 -0.0209 5.8330 0.9322  2874.0000 
-217.2000 -0.7372 0.1418 0.3848 1.2980 125.8000 
-221.4000 -0.8928 0.0006 1.5640 1.0280 335.0000 
-11040.000 = -0.830 0.033 -21.480 0.986 235.200 


As in part (a), the distribution of the averages seems to have several outliers. 


2. Since X,,--- , X39 are independent Poisson random variables having mean 1. By 
CLT, X1 + --- + X39 ~ N(30,30), thus P(X; + --- + X39 < 35) can be 
found by pnorm(35, 30, sqrt(30)) if no continuity correction is used and this 
gives 0.8193; if using continuity correction, the command is pnorm(35.5, 30, 
sqrt(30)), which gives 0.8423. 


(b) By the property of Poisson random variable, X, +--+ -+ X39 has Poisson dis- 
tribution with mean 30. Thus, P(X, +--: + X39 < 35) can be found by 
ppois(35,30) which gives 0.8426 as the exact value. Clearly, the approxima- 
tion with continuity correction gives a more accurate value. 
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3. 


4. 


5. 


Since mean and variance of a uniform(0, 10) distribution are 5 and 100/12 respec- 
tively, according to CLT, the total waiting time S of 120 times has the distribution 


100 
S~N (s x 120, 120 x =) = N(600, 1000). 


Thus, the 95th percentile of the total waiting time can be found by the R command 
qnorm(0.95, 600, sqrt(1000)), which gives 652.0148. 


(a) The gamma distribution with parameters a and 3 has mean a and variance 
a8. Thus, by the CLT, 


2 
X10 N (cus, 1) = iN (20 29% 2” (36) = (40.2299), 
1 
and 
2 
Rw (aobs | = N(1 x 3,1 x 32/42) = N(3, 0.2143). 
2 


The two types of materials are independent, thus X, and X> are independent. 
By the property of normal distribution 


X, — X2 ~ N(4 — 3, 0.2222 + 0.2143) = N(1, 0.4365). 


(b) P(X, > X2) = P(X, — X2 > 0) = 0.9349, by the command 1-pnorm(0, 1, 
sqrt(0.4365)). 


Let S; and Sy be the total height of tower 1 and 2, respectively, and let X be 
the height of a randomly selected segment. Then X ~ uniform(35.5, 36.5), hence 
ji= 36 and eo = (36.5 = 35.5)" /12=.1/12: 

Clearly, S; = X; + Xo+---+ X30, where X1, X9,--- , X30 are the heights of the 
randomly selected 30 segments. By CLT, S; ~ N(30, 3007). By the same reason, 
Sy ~ N(30p, 3007). Since the segments for tower 1 and 2 are independent, we have 
S, — Sy ~ N(0,2 x 3007) = N(0,5). 

The roadway can be laid when |S; — S2| < 4. This probability is 


P(|S1 — S5| < 4) = P(S, — Sy < 4) — P(S; — S < —4) = 0.9264, 


which is calculated by the command pnorm(4, 0, sqrt(5)) - pnorm(-4, 0, sqrt(5)). 
This probability is approximated. 


The mean and variance of the tip from a random customer are 1.8175 and 0.1154, 
respectively. Let S be the total tips from the 70 customers then, by CLT, S ~ 
N(np,no*) = N(70 x 1.8175, 70 x 0.1154). Thus, the probability for her tips to 
exceed $120 is P(S > 120) = 0.9945 by the command 1 - pnorm(120, 70*1.8175, 
sqrt(70*0.1154)). 
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7. Let R be the round-off error, then R has a uniform distribution on (—0.5, 0.5). Thus, 
the mean value and variance of R are 4 = 0 and o? = (0.5 — (—0.5))?/12 = 1/12. 
Let R,,--- ,Rsq be the round-off errors of the 50 numbers. By CLT, the average 
round-off error R has a normal distribution R ~ N(,0?/n) = N(0,1/600). The 
event that the resulting average differs from the exact average of the 50 numbers 
by more than 0.1 happens if |R| > 0.1, thus the corresponding probability is 


P(|R| > 0.1) = P(R > 0.1) + P(R < —0.1) = 2P(R < —0.1) = 0.0148, 
which is calculated by the command 2*pnorm/(-0.1, 0, sqrt(1/600)). 


8. let T = X,+---+X,, be the combined duration of n components, we want P(T > 
3000) = 0.95. By the CLT, T ~ N(np, no?) = N(100n,900n). Thus, 


T-1 
Se ey, 
30./n 
Then, 
T — 100 3000 — 100 3000 — 100 
P(T > 3000) = P aie a Plies oe 
30./n 30./n 30./n 
Thus, aia is the 5th percentile of N(0,1), which is —1.645, i.e. 
3000 — 100n — -1.645, 
30,/n 
or 


10n — 3 x 1.645./n — 300 = 0. 


Using the command polyroot(c(-300, -3*1.645, 10)) gives us \/n = 5.73, thus n = 
5.73? = 32.8. Finally, we should take n = 33. 


9. (a) Let Y be the coating thickness, then Y = X1+X2+---+X36, where X1,--- , X36 
are the thickness of the layers, and they are independent, have the same dis- 
tribution with mean pz = 0.5 and variance o? = 0.04. Since n = 36 > 30, we 
have approximately 


Y ~ N(np,no?) = N(36 x 0.5, 36 x 0.04) = N(18, 1.44). 


(b) The proportion is P(Y < 16), which can be calculated using the R command 
pnorm(16, 18, 1.2) and the result is 0.0478. 


10. (a) Let X be the average diameters of 100 rods, according to CLT, X ~ N(p,0?/n) = 
N (0.503, 0.037/100). The probability that the batch passes the inspection is 
P(0.495 < X < 0.505) = 0.7437, which is given by the command pnorm(0.505, 
0.5038, 0.003) - pnorm(0.495, 0.503, 0.008). 
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I, 


12: 


ia, 


(b) 
(i) X has a binomial distribution with n = 40 and p = 0.7437. The exact 


value for P(X < 30) can be calculated by the command pbinom(30, 40, 
0.7437) and the result is 0.5963. 


(ii) By the DeMoivre-Laplace Theorem, X approximately follows N (np, np(1— 
p) = N(40 x 0.7437, 40 x 0.7437 x (1 — 0.7437)). Without the continu- 
ity correction, the command is pnorm(30, 40*0. 74387, sqrt(40*0. 7437*(1- 
0.7487))), which gives 0.5364; with the continuity correction, the com- 
mand is pnorm(30.5, 40*0. 7487, sqrt(40*0. 7487*(1-0.7437))), which gives 
0.6073. The method with continuity correction gives more accurate result. 


(a) X has a binomial distribution with n = 500 and p = 0.6. The exact value 
for P(270 < X < 320) = P(X < 320) — P(X < 269) can be calculated by 
the command pbinom(320, 500, 0.6)-pbinom(269, 500, 0.6) and the result is 
0.9671. 


S 


By the DeMoivre-Laplace Theorem, X approximately follows N(np,np(1 — 
p) = N(500 x 0.6,500 x 0.6 x (1 — 0.6)) = N(300,120). Without the con- 
tinuity correction, the command is pnorm(320, 300, sqrt(120))-pnorm(270, 
300, sqrt(120)), which gives 0.9630; with the continuity correction, the com- 
mand is pnorm(320.5, 300, sqrt(120))-pnorm(269.5, 300, sqrt(120)), which 
gives 0.9667. The method with continuity correction gives a more accurate 
result. 


Let X be the thickness of a randomly selected tire, then X ~ (10,4), and let p 
be the rejection probability for a randomly selected tire, then p = P(X < 7.9), 
which can be calculated by pnorm(7.9, 10, 2). Let Y be the number of rejections 
among the 100 tires, then Y ~ Bin(100, p), and we want to calculate P(Y < 10). 
By the DeMoivre-Laplace Theorem, Y approximately follows N(100p, 100p(1 — p)). 
To calculate the probability, we apply continuity correction and the R. command 
is pnorm(10.5, 100*pnorm(7.9, 10, 2), sqrt(100*pnorm(7.9, 10, 2)*(1-pnorm(7.9, 
10, 2)))), which gives 0.1185 as the probability. 


(a) Let X, and X» be the number of defective items found in line A and B, 
respectively. Then, X, ~ Bin(200,0.1) and X2 ~ Bin(1000,0.01). By the 
DeMoivre-Laplace Theorem, X, approximately follows N (200 x 0.1, 200 x 0.1 x 
0.9) = N(20,18), and X2 approximately follows N(1000 x 0.01, 1000 x 0.01 x 
0.99) = N(10,9.9). By the independence of the two lines, we have X, + X9 
approximately follows N(20 + 10,18 + 9.9) = N(30, 27.9). The problem is to 
calculate P(X, + X2) < 35. Using continuity correction, the R command is 
pnorm(35.5, 80, sqrt(27.9)), which gives 0.8511. 


(b) The commands are listed as 


S = expand.grid(X=0:200, Y=0:1000) 
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P = expand.grid(px=dbinom(0:200, 200, .1), py=dbinom(0:1000, 1000, .01)) 
P$pxry = P$px*P$py; attach(P); attach(S); sum(pxry/which(X+Y<=35)/) 


The exact probability is given as 0.8509. 
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Chapter 6 
Fitting Models to Data 


6.2 Some Estimation Concepts 


i 


3. 


4, 


Using the commands OZ = read.table(“OzoneData.tat”, header=T) to read data 
and using mean(OZ$OzoneData) to get the sample mean, we get 286.3571. The 
command sd(OZ$OzoneData)/sqrt (14) gets an estimated standard error of 17.07244. 


The difference of the average maximum penetration between the two types is es- 
timated as 0.49 — 0.36 = 0.13 and the estimated standard error of X, — X» is 


calculated as 
oF. oe 0.192 0.16? 
a = = 0.0369. 
SX Ny \ a 48° 42 


The proof is straightforward: 


ny tng —2 ny tng —2 


_ (m1 — 1)o? + (nz — 1) 0? oes 


E(o2) =E [ — Sj + (m - 2) — (m1 — YE(ST) + (ne — DE(S3) 


ny tng —2 


(a) The parameter of interest is the proportion of all credit card customers who 
had incurred an interest charge in the previous year due to an unpaid balance. 
The empirical estimator is the proportion in a sample of credit card customers 
who had incurred an interest charge in the previous year due to an unpaid 
balance. Using the provided information, we can get the estimate as p = 
136/200 = 0.68. 


(b) Yes, it is unbiased. 
(c) The estimated standard error is 


p(1 — p) — x (1 — 0.68) 
(aq) = 0.033. 
Sp Y n 200 
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5. The standard error is Sg = Syx = 2S = 26/V12n. 6 is unbiased because E(0) = 
BQX) =28(X)=]2% 0/2 =0, 


6. (a) E(p, — po) = E(p1) — E(p2) = E(X)/m—E(Y)/n = mpi/m—npe/n = pi — pa, 
thus p, — p>. is unbiased estimator of p, — po. 


(b) The standard error of p, — po is 


/ pi(l nts pi) po(1 = 2) 
Opp, = Os + o%, = \/ m 3 n : 


The estimated standard error is 


Pil—pi) | po(1 — pr) 
Sei — \/ ea 


m nm 


(c) From the data, we have p, = X/m = 70/100 = 0.7 and py = Y/n = 160/200 = 
0.8, thus the estimator for p; — po is pj —p2 = —0.1 and the estimated standard 
error is 


I = VU. 4. 
100 200 oe 


0.7x (1-0.7) 0.8 x (1—0.8) 
Si — th — 


7. (a) The model-free estimation is 0.5. 


(b) Using the commands r=c(2.08, 2.10, 1.81, 1.98, 1.91, 2.06); 1-pnorm(2.05, 
mean(x),sd(x)), we get a model based estimation of 0.2979. 


8. (a) The average of the 10,000 variances was computed as 0.083 (it might be differ- 
ent at a different time and with a different computer), which is very close to 
the population variance. On the other hand, the average of the 10,000 sam- 
ple standard deviations was computed as 0.235, which is not as close to the 
population version. Thus, we conclude that S$? is unbiased but 9 is biased. 


(b) In part (a), the bias of S is 0.235 — 0.2887 = —0.0537. When using the sample 
size n = 5, the average of the 10,000 sample standard deviations was computed 
as 0.278, with bias 0.278 — 0.2887 = —0.0107. Thus, we conclude that the bias 
of S decreases as the sample size increases. 


9. (a) Using R commands we can get P(12 < X < 16) = 0.2956 and the 15th, 25th, 
55th, and 95th percentiles are 6.85, 8.30, 11.50, and 17.58, respectively. 


(b) The estimated values for P(12 < X < 16) is 0.38 and the estimated 15th, 
25th, 55th, and 95th percentiles are 6.37, 7.95, 13.09, and 18.89. 


(c) We use the following commands 


m = mean(x); s = sd(x); 
pnorm(16, m, s) - pnorm(12, m, s) 
qnorm(c(0.15, 0.25, 0.55, 0.95), m, 8) 
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The model-based estimation for P(12 < X < 16) is 0.289 and the model-based 
estimation for 15th, 25th, 55th, and 95th percentiles are 6.67, 8.40, 12.22, and 
19.46. 


By comparing the results in (b) and (c) to those in (a), we can see that, in 
general, the model-based estimators are closer to the population values. 


10. (a) The normal Q-Q plot is given below. 


Normal Q-Q Plot 


48 


Sample Quantiles 
8 


44 
l 


42 


Theoretical Quantiles 


The figure suggests that the normal model for the data is appropriate. 


(b) The model based estimation for P(44 < X < 46) is 0.39 and the model based 
estimation for median and 75th percentile are 45.20 and 46.52, respectively. 


(c) The model-free estimation for P(12 < X < 16) is 0.375 and the model-free 
estimation for median and 75th percentile are 44.885 and 46.420 , respectively. 


(d) Since the Q-Q plot suggests that normal assumption is appropriate, we would 
prefer the model-based estimation. 


6.3 Methods for Fitting Models to Data 


1. For the exponential(A) distribution, . = 1/X. Letting X = 1/A, we can solve for 
the method of moment estimator for A as A = 1/X. It is not unbiased estimator 
because F(1/X) £4 1/E(X). 

2. (a) The commands to fit Weibull(a,@) distribution are 
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t=read.table(“RobotReactTime.tat”, header=T); t1=t$Time/t$Robot==1); 
fn=function(a) 
{ (mu/gamma(1+1/a))**2*(gamma(1+2/a)-gamma(1+1/a)**2)-var} 
library(nleqslu); mu=mean(t1); var=var(t1); 
nleqslu(13, fn); mu/gamma(1+1/32.39172) 


The fitted model parameters are @ = 32.39, and B = 505. 


(b) To fit the exponential(A) distribution, using the results in Example 6.3-5, we 
have A = 1/X = 0.0328. 

(c) The model-based estimate of 80th population percentile is 31.51 under model 
(a) (using command quweibull(0.8,32.89,31.05)) and it is 49.07 under model 
(b) (using command gexp(0.8, 0.0328). As for the probability P(28.15 < X < 
29.75), the estimates under the two models are 0.1805 and 0.0203, respectively. 

(d) Using the commands quantile(t1,0.8); sum(t1>=28.158t1<=29.75)/length(t1), 
we get the empirical estimate for the 80th population percentile and the prob- 
ability P(28.15 < X < 29.75) as 31.522 and 0.2727, respectively. 


3. For gamma(a, 8) distribution, we have = af and o? = af?. Thus, 6 = o7/p, 
and a = y/o”. We get an estimator of @ = X*/S? and 6 = S?/X. For the given 
problem, & = 113.57/1205.55 = 10.686 and 8 = 1205.55/113.5 = 10.622. 


4. (a) Since uw = 6,/7/2, there is 6 = p,/2/7. Thus, the method of moments 
estimator for 0 is 6 = X,/2 /n. It is unbiased because 


E(6) = B(x) /2 = B(x)y/2 = of? =f. 


(b) A model based estimator of the population variance is 


26-T _ goe—t 
nm 2 T 


a2 pam _ x 
2 


G? is not an unbiased estimator of a? because 


B(6%) = B(X*)—* = (Var(X) + B(X)?)— = (¢ | 1?) = 


5. (a) Since X ~ Bin(n,p), E(X) = np. Thus, we can estimate p by p = X/n. It is 
unbiased because E'(p) = E(X)/n = np/n = p. 
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7. 


(b) By part (a), p = 24/37 = 0.6486. 


(c) The system lasts more than 350 hours if and only if both of the two components 
can last more than 350 hours. By the independence of the two components, 
this probability is p? and can be estimated as p?. Given the information in 
(6), p* = (24/37)? =0.4207. 


(d) p” is not unbiased estimator for p? because E(p?) = Var(p) + E(p)? = 
p(l—p)/n+p? 4p. 
(a) Since the PMF of Poisson(A) is e~*A"/a! The likelihood function is 


zr! ra) Ly! Li x;! 


and the log-likelihood function is 


L(A) = —nd + (>: 2) log \— Slog a;!. 
i=1 i=1 


Setting the first derivative of the log-likelihood function to zero yields the 


equation 
~ 1 
= + a —? 0. 


Solving this equation with respect to \ yields the MLE \ = X of X. 
(b) The MLE estimate of \ is X = 2.24. 


(c) The model-based population variance is 62 = \ = X = 2.24, and the sam- 
ple variance is 1.533. Assuming the Poisson model correctly describes the 
population distribution, we would prefer the model-based estimate. 


(a) There are X + 5 helmets and the last one has flaw, among the rest X + 4 
helmets, there are 4 with flaw and X flawless, thus, we have the probability 


px =a)=(*]") ay 


Therefore, the log-likelihood function is 


X+A4 
L(p) = to ( 4 ) + 5logp+ X log(1 — p). 


Setting the first derivative of the log-likelihood function to zero yields the 
equation 


2 
p 1p 
Solving this equation yields the MLE p = 5/(5+ X). 
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(b) The distribution of X is easily identified as Negative binomial with r = 5 
and parameter p (compare to formula (3.4.15)). Thus, F(X) = r/p = 5/p. In 
method of moment estimation, set X = 5/p, and we can solve for the estimator 
p=) 2. 

(c) If X = 47, the MLE (a) gives p = 5/(5 + 47) = 0.096 and the method of 
moment formula in (b) gives p = 5/47 = 0.106. 

8. (a) For uniform(0, @) distribution, E(X) = 0/2, thus the method of moment 
estimator is 6 = 2X. For the commands set.seed(3333); x=runif(20, 0, 10); 
mean(x), we have X = 5.359, thus @ = 10.718. The model-based estimator of 
the population variance o? is 6? = 67/12, thus, for this dataset, the estimate 
8 10,7187 /12 = 9.573. 

(b) The sample variance is 6.781. Compared to the true value of the population 
variance, 107/12 = 8.333, the model-based estimate overestimates 1.24, while 
the model-free estimate underestimates 1.552. Thus, the model-based estimate 
provides a better approximation. 

9. (a) To get the moments estimator for 0, solve the equation P = E(P), that is 
P=0/(1+8), and we have the estimator 

, P 
= x. 
1—P 
(b) For the given data, the estimate of 6 is 6 = 0.202. 
10. (a) The regression coefficients are 
a= n> cy — Ol t)QOl yi) — 11 x 400.5225 — 263.53 x 36.66 _ 0.1420 
tna? — (Soa)? sd: x :9677.4709 — 263.532 
ae 36.66 263.53 
a, = Y — BX = —— — (-0.1420) ——— = 6.735. 
ay By [oo a 6.735 
Thus, the regression line is 7 = @, + 6,x = 6.735 — 0.142z. 

(b) Since the observed concentrations are in the range of 2.50 to 55.00 and the 
concentrations 4.5 and 34.7 are in this range, but 62.8 is not in the range, we 
can conclude that it is appropriate to use the regression line to 4.5 and 34.7. 
The estimated expected corrosion rate at 4.5 is 6.735 — 0.142 x 4.5 = 6.096, 
and at 34.7 is 6.735 — 0.142 x 34.7 = 1.808. 

11. (a) Using the commands z = c(498,526,559,614); y=c(16, 25, 34, 89); lm(y~x), 


we find the estimated regression line is ¥ = —78.7381+0.1952x7. The expected 
number of manatee deaths in a year with 550,000 powerboat registrations is 
estimated as —78.7381 + 0.1952 x 550 = 28.62. 
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(b) The R command for (6.3.11) is sum(y**2)+78. 7381 *sum(y) - 0.1952*sum(x*y) 
and it gives 4.82 as the error sum of squares. The intrinsic error variance is 


SSE/(n — 2) = 24.82/(4 — 2) = 12.41. 


(c) The command Im(y~x)§fitted gives the fitted values as 18.49371, 23.96056, 
30.40364, and 41.14209. The command Im(y~x)$resid gives the residuals as 
—2.493712, 1.039438, 3.596365, and -2.142091. The command sum/((lm(y~x) Sresid) **2) 
gives the sum of squared residuals and is the same as in part (b). 


12. (a) The following shows the scatterplot of the data with the fitted regression line 
drawn through it. 


Strength 


modulus of Elasticity 


From this graph, the linearity of the regression function and homoscedasticity 
appear to hold. 


(b) The LSE regression coefficients are @ = 2.5801 and 8, = 0.1339. We can 
estimate the expected strength at modulus of elasticity X = 60 as 2.5801 + 
0.1339 x 60 = 10.6141. 


(c) Using the commands out=lm(y~x); sum(out$resid**2); sum(out$resid**2)/out$df. resid, 
we get the error sum of squares and the estimator of the intrinsic error variance 
are 15.16757 and 0.6067028, respectively. 


13. (a) The LSE regression coefficients are G; = 19.9691 and By = 0.2255. We can 
estimate the expected age at diameter x as 7 = 19.9691 + 0.2255 x a. 


(b) The scatterplot of the data is shown on the next page. 
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150 
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This figure suggests that the age of the tree increases with the diameter of 
the tree at approximately linear fashion. Thus, the assumption of linearity of 
the regression function seems to be, at least approximately, satisfied. On the 
other hand, the variability in age of trees seems to increase with the diameter 
of tree. Thus, the homoscedasticity assumption appears to be violated for this 


data set. 


(c) The following shows the scatterplot of the transformed data. 


logarithm of Age 


T 
3 


logarithm of Diameter 


; 
4 
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After the log-transformation, the assumptions of the simple linear regression 
model seem to be valid. 


6.4 Comparing Estimators: The MSE Criterion 


1. 
for 62 is Bias(@2) = E(62) — 6 = n6/(n+ 1) -—6 = -0/(n +1). Thus, 6, is 
unbiased while 65 is biased. 


(b) For 6;, we have 


(a) Bias(61) = E(61) — 0 = 2E(X) — 0 = 2B(X) —0 =2 x 0/2—0 =0. The bias 


‘ ‘ hae ss = a Ag 
MSE(61) = Var(@1) + Bias(@)° = Var(2X) = 4Var(X) = 4 


For b, 


MSE(62) = Var(62) + Bias(62)? = Cems ye | ( d ) 


Oe 
(n+ D(n+2) 


(c) When n = 5 and true value of @ is 10, we have MSE(6,) = 102/(3 x 5) = 6.67, 


while MSE(62) = 2 x 10°/[((5 + 1)(5 + 2)] = 4.76. According to the MSE 
selection criterion, #2 is preferable. 


2. From the distributions of X1,--- , X19 and Yi,--- , Yio, we have E(x) =2Y) =p, 
Var(X) = 07/10, and Var(Y) = 407/10. X and Y are also independent. Thus, 


(a) For any 0 <a <1, E(f) = E(aX +(1—a)Y) = aE(X) + (1-a)E(Y) 
ap+ (1—a)u = pw. Thus, fi is unbiased for pu. 
(b) Since fi is unbiased, 


MSE(ji) = Var(js) = Var(aX + (1 — a)Y) = a? Var(X) + (1 — a)?Var(Y) 
— +(1- 24e" 5 78 i 
iG ig et D 


(c) The estimator 0.5X +0.5Y corresponds to fi with a = 0.5. The MSE is 


2 2 
MSE(0.5X + 0.5Y) = (5 x 0.52-8 x 0.5+ 4)— = L255. 

Since MSE(X) = Var(X) = 07/10 < MSE(0.5X + 0.5Y), X is a preferable 

estimator. 
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Confidence and Prediction Intervals 


7.3. Type of Confidence Intervals 


1. (a) The 95% CI for the mean pi is X+ty~-1,0/25//N, or 45.184t41-1,0.05/211.48/V11, 
which is calculated as (37.47, 52.89). In order to make the CI valid, we need 
to assume that the data are distributed approximately normal. 


(b) The normal Q-Q plot is shown below. 


Normal Q-Q Plot 


50 60 
l 


Sample Quantiles 
40 
° 


20 
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Theoretical Quantiles 


The Q-Q plot shows that the normal assumption is approximately valid. 


2. (a) Using the R commands r=c(649, 832, 418, 5380, 884, 899, 755); 
confint(lm(z~ 1), level=0.9), we find the 90% CI for the true mean histamine 
content for all worker bees of this age as (489.9886, 786.2971). In order to make 
the CI valid, we need to assume that the data are distributed approximately 
normal. 
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(b) False 


3. (a) The 80% CI for the mean breaking strength ps is X + ty—1,a/25//n, or 210 + 
ts0-1,0.1/218/v/50, which is calculated as (206.69, 213.31). Since the sample 
size n = 50 is large enough, the normality assumption is not necessary. 

(b) Yes 
(c) No 


4. (a) In Exercise 1, after sorting, the data is 19.62, 36.75, 36.86, 41.72, 46.84, 47.53, 
48.42, 48.82, 50.59, 57.16, 62.69. Thus, the CI (36.86, 50.59) is X(s), Xi). By 
the command 2*(1-pbinom(11-3, 11, 0.5)), we find a = 0.06542969, thus the 
confidence level is (1 — a)100% = 93.46%. The CI (36.75, 57.16) is X(2), X10) 
and the confidence level is 98.83%. 


(b) In Exercise 2, after sorting, the data is 384, 418, 530, 649, 755, 832, 899. Thus, 
the CI (418, 832) is X(2), Xi). By the command 2*(1-pbinom(7-2, 7, 0.5)), 
we find a = 0.125, thus the confidence level is (1 — a)100% = 87.5%. 


5. (a) The normal Q-Q plot is shown below. 


Normal Q-Q Plot 


450 
l 


Sample Quantiles 
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L 


Theoretical Quantiles 


The Q-Q plot shows that the normal assumption is not quite valid. 


(b) The 90% CI for the mean ozone level is given by the command 
confint(Im(a~ 1), level=0.9) and the result is (256.123, 316.5913). 


(c) The 90% CI for the median ozone level is given by the command library(BSDA); 
SIGN.test(x, alternative= “two.side”, conf.level=0.9) and the result is (248.0466, 
291.1626). 
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(d) The length for the 90% CI for the mean is 60.468, while the length for the 90% 
CI for the median is 48.116. Clearly, the 90% CI for the median is shorter. 
Since the normal assumption of the data seems not appropriate, we would 
prefer the 90% CI for the median. 


(a) The 95% CI for the mean solar intensity is given by the command 
confint(lm(z~ 1), level=0.95) and the result is (706.4223, 721.9277). Since the 
sample size n = 40 is large enough, the normality assumption is not necessary. 


(b) The 90% CI for the median solar intensity is given by the command l- 
brary(BSDA); SIGN.test(x, alternative= “two.side”, conf.level=0.95) and the 
result is (704.8189, 728.0892). No assumption is needed. 


(c) The interpretation of the confidence level is wrong for both the CI in (a) and 


(b). 


(a) For Poisson(A) distribution, 4 = A. Thus, the 95% CI for X is the same as the 
95% CI for yz. We use the following command: 


t=c(rep(0,4), rep(1,12),rep(2,11),rep(3,14),rep(4,9)); 
confint(lm(a~ 1), level=0.95) 
to find the CI as (1.888116, 2.591884). 


(b) For Poisson(A) distribution, o? = \. Thus, the 95% CI for a is 
(.\/1.888116, V/2.591884), or (1.374087, 1.609933). 


(a) The 95% CI for the mean eruption duration is given by the command 
confint(Im(ed~ 1),level=0.95) and the result is (3.351534, 3.624032). 

(b) The 90% CI for the median eruption duration is given by the command l- 
brary(BSDA); SIGN.test(x, alternative= “two.side”, conf.level=0.95) and the 
result is (3.833, 4.1115). 

(c) The sample proportion, , can be found with the R command 
phat=sum(ed>4.42)/length(ed), which gives 0.2683824. Then using the fol- 
lowing commands 

alpha=0.05; 
phat-qnorm(1-alpha/2) *sqrt(phat*(1-phat) /length(ed)); 
phat+qnorm(1-alpha/2) *sqrt(phat*(1-phat) /length(ed)), 


gives the 95% CI for the probability that an eruption duration will last more 
than 4.42 min as (0.2157221, 0.3210426). 


(a) To find the 95% confidence interval for the proportion, p, of customers who 
qualify, we use the following commands: 
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10. 


Le, 


n=500; phat = 40/n; alpha=0.05; 
phat-qnorm(1-alpha/2) *sqrt(phat*(1-phat)/n); 
phat+qnorm(1-alpha/2) *sqrt(phat*(1-phat)/n), 


The obtained Cl is (0.05622054, 0.1037795). 


(b) In order to make the CI valid, there should be at least 8 customers who qualify 
and at least 8 customers who do not qualify in the sample, which is satisfied 
by the data. 


(a) To find the 95% confidence interval for the proportion, p, of young adult US 
citizens who drink beer, wine, or hard liquor on a weekly basis, we use the 
following commands: 


n=1516; phat = 985/n; alpha=0.05; 
phat-qnorm(1-alpha/2) *sqrt(phat*(1-phat)/n); 
phat+qnorm(1-alpha/2) *sqrt(phat*(1-phat)/n), 


The obtained Cl is (0.6257221, 0.6737502). 
(b) False 


(a) To find the 95% confidence interval for the proportion, p, that a randomly 
selected component lasts more than 350 hours, we use the following commands: 


n=37; phat = 21/n; alpha=0.05: 
phat-qnorm(1-alpha/2) *sqrt(phat*(1-phat)/n); 
phat+qnorm(1-alpha/2) *sqrt(phat*(1-phat)/n), 


The obtained Cl is (0.4948251, 0.8024722). 


(b) Assuming that the life spans of the two components in the system are indepen- 
dent, the probability that the system lasts more than 350 hours is p?. Thus, 
the 95% CI for the probability that the system lasts more than 350 hours is 
(0.49482517, 0.80247227). 


12. By (7.3.12), the (1 — a)100% CI for py)x(z) is 


fy|x (2) © tn2,0/25 iy x (2) 


7 1 n(x — X)? 
Shy x(o) — st + ny. X? = (SS X;)?" 


with 


Since ply;x(0) = ay and fry;x(0) = a4, we can let x = 0 in the above formula to 
find the (1 — a)100% CI for a; as 


ae ry is 
a M—L, € 5 
2 N a XP (EXP 
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13. (a) From the given information, we have 


» ndOXYi-—(OX)KY;) _ 10 x 2418968 — 3728 x 5421 


_ = = 0.9338 
Ai ny) X?— (5. X;)? 10 x 1816016 — 3728? 
5421 3728 
@,=Y—-B,X = Fa 7 0.9888 x S5- = 193.9648. 


The LSE for o? is 


S? = — [So ¥? - & 0% - A Ox 


1 
= g (3843359 — 193.9643 x 5421 — 0.9338 x 2418968] = 4130.776. 


(b) From the data, we have 


n 
o. =s: 
ey Jam — (2X) 
= \/10 x 4130.776/(10 x 1816016 — 37282) = 0.09844647, 


thus the 95% CI for the true slope of the regression line is 
Bi + tn2,0/25g, = 0.9338 + te,o.025 x 0.09844647, 


which is calculated as (0.7068, 1.1608). In order to make the CI valid, we need 
the assumption that the error terms are normally distributed. 


(c) When the surface conductivity is 500, the expected sediment conductivity is 
fty|x(500) = &) + By x 500 = 193.9643 + 0.9338 x 500 = 660.8643, 


and the estimated standard error is 


S, _¢ I a n(x — X)? 
fy|x (x) ~ Ve ae ny. X2 — (> X;)? 


1 10x (500 — 372.8)? 
= V/4130.776 ! = 23.87232. 
Vs 10 x 1816016 — 3728? 


Thus, the 95% CI for the the expected sediment conductivity at the surface 
conductivity 500 is 


jty|x (500) + tn—2,0/2S;iy, x (500) = 660.8643 + ty.o.025 X 23.87232, 


which is calculated as (605.8146, 715.914). 
When the surface conductivity is 900, the expected sediment conductivity is 


fty|x (900) = &, + By, x 900 = 193.9643 + 0.9338 x 900 = 1034.384, 
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and the estimated standard error is 


3, 9g te n(x — X)? 
fty|x (x) ~ Ve a ny) X?2 — (59> Xj)? 


1 10x (900 — 372.8) 
= V1130.776/ | adi "5.73858. 


10. 10 x 1816016 — 3728? 


Thus, the 95% CI for the the expected sediment conductivity at the surface 
conductivity 500 is 


jty|x (900) + tn—2,0/25jy x (900) = 1034.384 + tg,0,025 x 55.73858, 


which is calculated as (905.8506, 1162.917). 


The CI at X = 900 is not appropriate because 900 is not in the range of 
X-values in the data set. 


14. (a) The scatterplot is given below. 


The scatterplot of the data shows that the linearity and homoscedasticity 
assumptions of the simple linear regression model seem to be valid. 


(b) Using the R commands out=lm(y~x); out$coef, we have the LSE estimates 
Qa, = 3.130417 and By = —1.076647. Using the command 
sqrt(sum(out$resid**2)/out$df.resid), we find the LSE for o, as 0.0298461. 

(c) Using the R command confint(out, level =0.9), we find the 90% CI for 6, 
as (-1.279482, -0.873812). Since the mean difference of the strengths at wa- 
ter/cement ratios 1.55 and 1.35 is py) x (1.55) — Hy|x(1.35) = 0.21, we calculate 
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the 90% CI for the mean difference of the strengths at water/cement ratios 
1.55 and 1.35 as (-0.2558964, -0.1747624). 


(d) Using the R commands t=data.frame(x=c(1.35,1.45,1.55)); predict (out, t, in- 
terval= “confidence”, level=0.9), we find the 90% Cls for the mean strength at 
water/cement ratios 1.35, 1.45, and 1.55 as (1.648821, 1.705066), (1.553986, 
1.584572), and (1.439260, 1.483969), respectively. 


(e) Using the R commands qqnorm(out$resid); qqline(out$resid, col=“red”), we 
get the following normal Q-Q plot of the residuals: 


Normal Q-Q Plot 


0.04 


Sample Quantiles 


-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 


Theoretical Quantiles 


The plot does not suggest serious departure from the normality assumption 
and thus, the above Cls can be reasonably trusted. 


15. (a) The scatterplot is given on the following page. 
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The scatterplot of the data shows that the linearity and homoscedasticity 
assumptions of the simple linear regression model seem to be valid. 


(b) Using the R commands out=lm(y~x); confint(out, level =0.95), we find the 
95% CI for 6, as (-0.2236649, -0.117264). Using the R commands 
t=data.frame(x=80); predict(out, t, interval= “confidence”, level=0.95), we 
find the 95% CI for the expected wind speed on an 80F day as (9.082136, 
10.11093). 


16. Using the command 2*(1-pbinom(30-10, 30, 0.5)), we find a = 0.04277395, thus 
the confidence level is (1 — a)100% = 95.72%. 


17. Using the command n=16; a=4; 1-2*(1-pbinom(n-a,n,0.5)), we have the result 
97.87%; changing a to 5, we have the result 92.32%. These are consistent with 
Example 7.3-7. 


18. From (7.3.19), we find the (1 — a)100% CI for o is 


n—-1 n—-1 
. S<a<,/5=— 
Xn-1,a/2 Xn—1,1-a/2 


Thus, we can write the R commands as 


n= 1s) = V62 = 005: 
= sqrt((n-1)/qchisq(1-a/2, n-1))*S; U=sqrt((n-1)/qchisq(a/2, n-1))*S; 
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We find the 95% CI for o as (0.468561, 1.009343). In order to make the CI valid, 
we need to assume that the population distribution is normal. 


19. From (7.3.19), we find the (1 — a)100% CI for a is 


n—-1l n—-1 
Xn-1,0./2 Xn-1,1—-a/2 


Thus, we can write the R commands as 


(=c0 8 SUT eH O05: 
L = sqrt((n-1)/qchisq(1-a/2, n-1))*S; U=sqrt((n-1)/qchisq(a/2, n-1))*S; 


We find the 95% CI for o as (0.09463803, 0.1532936). The traditional value of 0.1 
lies within the CI. 


20. Using the R commands 


rt=read.table(“RobotReactTime.tut”, header=T); t2=rt$Time/rt$ Robot==2]; 
n= length(t2); S2 = var(t2); a = 0.05; 
L = (n-1)*S2/qchisq(1-a/2, n-1); U=(n-1)*S2/qchisq(a/2, n-1); 


We find the 95% CI for the population variance of reaction times of Robot 2 as 
(0.4916012, 1.696162). 


7.4 The Issue of Precision 


1. We use the R command library(BSDA); nsize(b=0.2, sigma=1.2, conf.level=0. 98, 
type= “mu”); the desired sample size is 195. 


2. We use the R command library(BSDA); nsize(b=4/2, sigma=18, conf.level=0.9, 
type= “mu”); the desired sample size is 220. 


3. (a) Weuse the R command library(BSDA); nsize(b=0.1/2, p=9/40, conf.level=0.9, 
type= “pi” ); the desired sample size is 189. 


(b) If no prior information is given, we use the R command library(BSDA); 
nsize(b=0.1/2, p=0.5, conf.level=0.9, type=“pi”); the desired sample size is 
Atl. 


4. (a) Weuse the R command library(BSDA); nsize(b=0.03, p=75/198, conf.level=0. 95, 


type= “pi” ); the desired sample size is 1015. 

(b) If no prior information is given, we use the R command library(BSDA); 
nsize(b=0.08, p=0.5, conf.level=0.95, type=“pi”); the desired sample size is 
1068. 
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7.5 Prediction Intervals 


1. The problem is to find a prediction interval for the next chocolate chip cookie. 


2. 


3. 


Using (7.5.4), the prediction interval is 


— / 1 
Ye taieyae 1+-. 
nm 


Using the R commands Ybar = 3.1; S = 0.3; n=16; a=0.1; L=Ybar - qt(1-a/2, 
n-1)*S*sqrt(1+1/n); U=Ybar + qt(1-a/2, n-1)*S*sqrt(1+1/n), we find the 90% 
prediction interval as (2.557899, 3.642101). In order to make the CI valid, we need 
to assume that the population distribution is normal. 


(a) Using the R commands Xbar = 30.79; S = 6.58; n=8; a=0.1; L=Xbar - qt(1- 
a/2, n-1)*S*sqrt(1+1/n); U=Xbar + qt(1-a/2, n-1)*S*sqrt(1+1/n), we find 
the 90% prediction interval as (17.66794, 43.91206). In order to make the Cl 
valid, we need to assume that the population distribution is normal. 

(b) Using the R commands L=Xbar - qt(1-a/2, n-1)*S; U=Xbar + qt(1-a/2, n- 
1)*S, we find the 90% confidence interval for the mean heat flux as (18.4184, 
43.1616). The prediction interval has a length of 26.24412, and the confidence 
interval has a length of 24.7432. It is clear that the prediction interval is longer. 


(a) Using the R commands predict(lm(y~1), data.frame(1), interval= “predict”, 
level=0.95), we find the 95% prediction interval for the compressive strength 
of the next concrete specimen as (41.17, 49.24). 


(b) The normal Q-Q plot for the data is shown as follows: 


Normal Q-Q Plot 


Sample Quantiles 


Theoretical Quantiles 
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The plot shows the normality assumption holds. 


(a) The expected separation distance between the next cyclist, whose distance 
from the roadway center line is 15 feet, and a passing car can be estimated as 


fly|x=15 = —2.1825 + 0.6603 x 15 = 7.722. 


Using formula (7.5.6), we calculate the prediction interval for Y when X = 15 
as 


: 1 10x (15 — 15.42)? 
hE Pai 0: 1 = (6.59, 8.86). 
pea 3390 + 79 + Fo x 2452.18 — 154.92 — (6:59. 8.86) 


(b) A distance of 12 feet is not in the range of X values in the data set, so the 
desired PI would not be reliable. 


(a) We use the command 
predict(lm(y~1), data.frame(1), interval= “predict”, level=0.95) 


to find (-36.24556, 441.9256) as the prediction interval for the weight of the 
next bear that will be captured during the same time period. Since the weight 
cannot be negative, we use (0, 441.9256) as the prediction interval instead. 


(i) No assumption is need for the validity of the prediction because Y is the 
best estimate for 4 under MSE criterion. 

(ii) To make the PI valid, we need to assume that the weights of bears are 
normally distributed. 


(b) We use the command predict(lm(y~z), data.frame(x=40), interval= “predict”, 
level=0.95) to find (165.9967, 301.5883) as the prediction interval for the 
weight of the next bear that will be captured during the same time period 
if its chest girth measures 40 cm. 


(i) To make the prediction valid, we need to assume that the weight and chest 
girth have a linear relation. 


(ii) The validity of the PI requires that all the assumptions of the normal 
simple linear regression model, that is, the additional assumptions of ho- 
moscedasticity and normality of the intrinsic error variables, be satisfied. 


(c) The prediction interval in part (a) is much longer than that in part (b). 
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8.2 Setting Up a Test Procedure 
1. (a) Let « be the mean soil heat flux, then the null and alternative hypotheses are 


Hg tyes. sl we: Bai poo. 


(b) If the null hypothesis is rejected, we should use the coal dust cover. 
2. (a) The null and alternative hypotheses are 


Ayt p< 025 vs. Ay: p> 025. 


(b) If the null hypothesis is rejected, we should adopt the modified bumper design. 


3. (a) If the CEO wants to adopt it unless there is evidence that it has a lower 
protection index, then H, : uw < po. 
(b) If the CEO does not want to adopt it unless there is evidence that it has a 
higher protection index, then H, : uw > po. 


(c) If the null hypothesis is rejected for part (a), the CEO should not adopt the 
new grille guard and for part (b), the CEO should adopt the new grille guard. 


4. (a) If the manufacturer does not want to buy the new machine unless there is 
evidence it is more productive than the old one, then H, : ps > po. 


(b) If the manufacturer wants to buy the new machine unless there is evidence it 
is less productive than the old one, then Hy :  < [o. 


(c) If the null hypothesis is rejected for part (a), the manufacturer should buy 
the new machine and for part (b) the manufacturer should not buy the new 
machine. 


5. (a) Let p be the proportion of all customers who qualify for membership, then the 
hypotheses are 
Ap: p>0.05 vs. Ha: p< 0.05. 
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(b) If the null hypothesis is rejected, the airline should not proceed with the 
establishment of the traveler’s club. 


6. (a) The statement is true. 


(b) We determine C' from the requirement that the probability of incorrectly re- 
jecting Hp is no more than 0.05 or, in mathematical notation, 


P(X > C) < 0.05 if Hp is true. 


Over the range of yz values specified by Hp (i.e., w < 28,000), the probability 
P(X > C) is largest when ps = 28,000. Thus, the requirement will be satisfied 
if C is chosen so that when ps = 28,000, P(X > C) = 0.05. This is achieved by 
choosing C' to be the 95th percentile of the distribution of X when ps = 28, 000. 
Recall that o is assumed to be known, this yields C = 28,000 + 20.950/./n. 
(c) Let 
X — 28,000 
of/n 


Then the standardized version of the rejection region is Zy, > 2.05. 
7. (a) The rejection region is of the form fiy|x(a) > C' for some constant C. 
(b) We determine C from the requirement that the probability of incorrectly re- 
jecting Hp is no more than 0.05 or, in mathematical notation, 


P(fty\x(a) > C) < 0.05 if Ho is true. 


Thus, the requirement will be satisfied if C' is chosen so that under Hp (i.e. 
by|x(2) = by|x(2)a, P(fy|x(z) = C) = 0.05. This is achieved by choosing 
C' to be the 95th percentile of the distribution of jiy)x(#) when py|x(x) = 
by|x()e,- Recall that the distribution of fiy)x (x) 


fty|x(Z) — My|x(Z).29 


Say)x(2) 


ie tn—2, 


this yields the selection of C = py|x(@) ay) + tn—2,0.05 Say) x(x): 


8. (a) Let p be the proportion of all detonators that will ignite, then the null and 
alternative hypotheses are 


Ho: p20) ve. Hygt p< 09: 
(b) The standardized test statistic is 


p= 09 


Ge , 
100.9 x 0.1/n 
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(c) The statement is false. 


9. (a) Since the (1 — a)100% CI for (6; is By + tn—2,0/253,, Ho : 81 = B10 is rejected if 


Bio < hi - tn—2,0/29g, OF Bio > 6, + tn—2,0/29,° 


These inequalities could be rewritten as 


fi Pr, 0 = lesan oF bum Pr, 0 Zz 2 saga: 
oe (Sh 
Let 
Bi — Bio Pi, am 
TH = a 
af 


the CI based RR could be written as |Ty,| > tnr—2,0/2- 
(b) Since the (1 — a)100% CI for py|x(x) is fy;x(£) + th_2a/25p 
Ho : fby|x(&) = pry|x(@)o is rejected if 


y|x (x 


By|x(X)o < fiy|x(@)—tn-2,0/2Spy)x(e) OF My|x(Z)o > fy|x(2)—tn—2,0/2S ay) x(x): 


These inequalities could be rewritten as 


A r —_ xt yy Ne —_— GG 
fty|x (x) — byx(x)o Sigh “al fty|x (x) — By|x()o 22 
Shiy|x(e) Shryix(2) 
Let . 
7, — Pux(@) = evix(2)o 
Ho a S . 


fty|x (2) 


Then the CI based RR could be written as |TH,| > tr—2,0/2- 
10. (a) The value of the test statistic is 


X — pm 28640 — 28000 
— = = 3.556. 
Gin 900/25 


Since the RR is Zy, > 2%, the smallest level at which Ho is rejected is found 
by solving 3.556 = z, for a. Let ®(-) be the cumulative distribution function 
of N(0,1). The solution to this equation, which is also the p-value, is 


p — value = 1 — 6(3.556) = 0.00019. 


(b) Since p-value< 0.05, the null hypothesis should be rejected at a 0.05 level of 
significance. 
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I, 


(a) The standardized test statistic is 
8/50 — 0.25 
\/0.25 x 0.75/50 


To sketch the figure, draw a N(0,1) PDF and shade the right of -1.469694, 
which represents the p-value. 


= —1.469694. 


Ho 


(b) The p-value is calculated as 0.9292. Since p-value> 0.05, the null hypothesis 
should not be rejected at a 0.05 level of significance. 


8.3 Types of Tests 


10 


Z 


3. 


(a) The value of test statistic is 


X — po 9.8 — 9.5 
Ty, = = = 1.94. 
Ho S/V¥n — 1.095/./50 
The RR is Ty, > tn—1,0 = t49,0.05 = 1.68. Since 1.94 > 1.68, we should reject 
Ho. 


(b) Since the sample size n = 50 > 30, no additional assumptions are needed. 


(a) Let ys be the average permissible exposure, then the null and alternative hy- 
potheses are 
Hg tgs Lo ws: ig tee 


(b) The value of test statistic is 
oe 


ly = = =i 

oa... LA (36 
The RR is Ty, > tri, = t35,0.05 = 1.69. Since 1.69 > 1.61, we should not 
reject Ho. Since the sample size n = 36 > 30, no additional assumptions are 
needed. 

(c) By using Table A4 the p-value should be between 0.05 and 0.1. Using the R 
command 1-pt(1.61,35), the exact p-value is 0.058. 


(a) Let « be the (population) mean penetration, then the null and alternative 
hypotheses are 
Ayo: u2<50 vs. Hy: u> 50. 


(b) The value of test statistic is 
ee X= po _ 52.7-50 _ 
S//n — 4.8/,/16 
The RR is Ty, > tn—1,o = t15,0.1 = 1.34. Since 2.25 > 1.34, we should reject Ho. 


In order to make the test valid, we need the assumption that the population 
is normal. 


2.20% 
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(c) By using Table A4 p-value should be between 0.01 and 0.025. Using R com- 
mand 1-pt(2.25, 15), the exact p-value is 0.02. 


(a) The value of test statistic is 


Xp  30.79—29 
= = = (0.775. 
° S/n 6.53/V8 
The RR is Ty, > tn—10 = t7,.0.05 = 1.89. Since 1.89 > 0.775, we should not 
reject Hy. Using R command 1-pt(0.775, 7), the exact p-value is 0.2318. 


(b) In order to make the test valid, we need the assumption that the population 
is normal. 


Ln 


(a) The standardized test statistic is 
40/500 -0.05 
\/0.05 x 0.95/500 


The RR is Zy, < —Zq = —20.01 = —2.33. Since 3.0779 > —2.33, we should 
not reject Ho, and thus the traveler’s club should be established. 


(b) The p-value is calculated as 0.999 by using R command pnorm(3.0779). 


ORT: 


Ho 


(a) Let p be the proportion of all customers in states east of the Mississippi who 
prefer the bisque color, then the the null and alternative hypotheses are 


Hyt pata ve. digs pus. 


(b) The standardized test statistic is 


1 =. 
_ 185/500 — 0.3 3416. 


Z2y. = 
Ho \/0.3 x 0.7/500 
The RR is Zy, > Za = 20.05 = 1.645. Since 3.416 > 1.645, we should reject Ho 
at level 0.05. 
c e p-value is calculated as 0. y using R command /-pnorm(3. . Since 
Th lue is calculated as 0.0003 by using R d 1 3.416). Si 
p-value is less than the significant level a = 0.01, we should reject Ho. 


(a) Let p be the proportion of all consumers who would be willing to try this new 
product, then the the null and alternative hypotheses are 


Ho tpiU2 ve. yip > 0.2. 


(b) The standardized test statistic is 


42 — 0.2 
es : = 0.23. 


G5 = 

Ho /0.2 x 0.8/42 
The RR is Zy, > Za = 20.01 = 2.33. Since 0.23 < 2.33, we should not reject 
Hp. The p-value is calculated as 0.41 by using R command 1-pnorm(0.23). 
Thus, there is not enough evidence that the marketing would be profitable. 
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8. (a) The estimated regression line is y = 66.3251 + 0.2494. 
(b) The scatter plot is shown as 
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Exposure Temperature 


The scatterplot of the data shows that the linearity and homoscedasticity 
assumptions of the simple linear regression model seem to be valid. 


The normal Q-Q plot of the residuals is shown as 


Normal Q-Q Plot 


Sample Quantiles 


I I T 
= 0 1 


Theoretical Quantiles 
The Q-Q plot shows that the normal assumption seems to be valid. 
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(c) From the command anova(out), SSE = 233.88 and SSR = 108.25, and SST = 
SSE+ SSR = 233.88 + 108.25 = 342.13. The percent of the total variability is 
explained by the regression model R? = SSR/SST = 108.25/342.13 = 0.3164. 


(d) The the value of the F statistic is 6.0169. The command gf(0.95, 1,13) returns 
the F\13,0.05 = 4.667. Since F' = 6.0169 > 4.667, the null hypothesis is rejected 


at level 0.05. 


(e) The null and alternative hypotheses are 


Ho: B, <0 Vs. 


H+ § >. 


From the command summary(out), the T value for the test statistic is 2.453. 
Thus, the p-value is returned by 1-pt(2.453, 13), which is 0.0145. Since the 
p-value is less than the significant level a = 0.05, we would reject Ho and 
conclude that the decrease in temperature would weaken the concrete. 


9. (a) The scatter plot is shown as 


maximal oxygen consumption 


100 


maximal heart rate reserve 


100 


The scatterplot of the data shows that the linearity and homoscedasticity 
assumptions of the simple linear regression model seem to be valid. 


The normal Q-Q plot of the residuals is shown as 
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Normal Q-Q Plot 


Sample Quantiles 


Theoretical Quantiles 


The Q-Q plot shows that the normal assumption seems to be valid, although 
there are several outliers. 


(b) The completed ANOVA table is 


DF Sum Sq Mean Sq F value Pr 
3 1 10630.6 10630.6 925.44 <2.2e-16 
Residuals 24 275.7 115 


The proportion of the total variability in oxygen consumption is explained by 
the regression model is R? = 0.9747. 


(c) The fitted regression line is 7 = —0.4021 + 1.02002. When the percentage 
of maximal heart rate reserve increases by 10 points, the estimated average 
oxygen consumption increases by 10.2. 


(d) 
(i) The null and alternative hypotheses are 
Hee hie 1 VS. tk Gy A, 
(ii) From the command summary(out), we have 3, = 1.02 and Sg, = 0.03353. 
Thus, the value of the test statistic is 
1 
0.03353 


Thus, the p-value is returned by 1-pt(0.5965, 24), which is 0.278. Since 
the p-value is greater than the significant level a = 0.05, we would not 
reject Ho. 


Ho = 0.5965. 
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10. 


Li 


(a) The sample size n = 48 + 2 = 50. 


(b) The estimate of the standard deviation of the intrinsic error is VMSE = 
\/11354/48 = 15.38. 


(c) For the test Hp : aj = 0 vs. Hp : a, 4 0, the value of the T test statistic is 
—2.601 and the corresponding p-value is 0.0123. For the test Ho : 2, = 0 vs. 
Hy : 2, # 0, the value of the T test statistic is 9.464 and the corresponding 
p-value is 1.490 x 10-”. 


(d) The completed ANOVA table is 


DF SumSq MeanSq_ F value Pr 
x 1 21186 21186 89.56562 1.490e-12 
Residuals 48 11354 236.5417 


(e) The proportion of the total variability of the stopping distance is explained by 
the regression model is R? = 0.6511. 


(a) Let j2 be the median income, then the null and alternative hypotheses are 
Hovp=s300. vs B27 p- 300. 
The converted hypotheses are 
Ho e=Ue Ve diet pe US: 


Copy the data into R object x and use the command sum(x>3800)/length(z), 
giving us p = 0.3. The test statistic 


gz, 8205 _ 03-05 
Ho 0.5/V/n 0.5//20 


The p-value is returned by 2*pnorm(-1.789), which is 0.0736. Since the p- 
value is greater than the significant level a = 0.05, we would not reject Ho and 
conclude that the data does not present strong enough evidence to conclude 
that the claim is false. 


==, 789. 


(b) Let f be the median income, then the null and alternative hypotheses are 
Hy? fi=300 vs A? ff < 300. 
The converted hypotheses are 
Ap: p=05 vs Ay: p< 0.0. 


Copy the data into R object x and use the command sum(x>3800)/length(x), 
giving us p = 0.3. The test statistic 


p-05 03-05 
Zu, = = = —1.789. 
Ho 0.5//n 0.5//20 
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The p-value is returned by pnorm/(-1.789), which is 0.0368. Since the p-value 
is less than the significant level a = 0.05, we would reject Hp and conclude 
that the data presents strong enough evidence to conclude that the median 
increase is less than 300. 


12. (a) The normal Q-Q plot of the data is shown as 


Normal Q-Q Plot 


Sample Quantiles 


29.5 
| 


29.0 
! 
° 
° 


Theoretical Quantiles 


The Q-Q plot shows that the data do not come from a normal distribution. 
We also notice that the dataset size is 22. Thus, it is not appropriate to test 
Ho: w= 28 vs. Hy: w > 28. Instead, we should test the median, Ho : fp = 28 
We. 41g 3 pb > 28, 


(b) We first calculate p by the command sum(r2>28)/length(r2), which is 1. The 


test statistic P 
_— p=05 LS 


0 0.5/Vn 0.5/V22 


The p-value is returned by 1-pnorm(4.69), which is 1.37 x 107°. Since the 
p-value is less than the significant level a = 0.05, we would reject Ho. 


ZH, = 4.69. 


13. (a) The original hypotheses are 
Ho: 20.75 = 250 vs Ha: X0.75 < 200, 


since according to the problem, we have (1 — 7)100 = 25, thus 7 = 0.75. The 
original hypotheses are transformed to 


Hot p=O1o eS Heat p = 15, 
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14. 


where p is the probability that an observation is larger than 250. In this 
problem, we calculate p by the command sum(r> 250) /length(x), which returns 
0.6875. Thus, the test statistic is 


7, Bom _ 0.6875 - 0.75 
10 md —m/n /0.75 x 0.25/16 


0 


==—(orT. 


The p-value is calculated by pnorm(-0.577), which returns 0.282. Since the 
p-value is greater than the significant level a = 0.05, we cannot reject Hp; that 
is, there is not enough evidence to show the 25th percentile is smaller than 
200. 


(b) The original hypotheses are 


SS 


NS 


Ho > V0.25 = 104 vs A,: X0.25 > 104, 


since, according to the problem, we have (1 — 7)100 = 75, thus 7 = 0.25. The 
original hypotheses are transformed to 


Aato=0.26 ve Hyp U.24, 


where p is the probability that an observation is larger than 104. In this 
problem, we calculate p by the command sum(r>104)/length(x), which returns 
0.3333. Thus, the test statistic is 


7, . Pam __0.3333-0.25 
- m1 —m/n 0.25 x 0.75/36 


0 


154. 


The p-value is calculated by 1-pnorm(1.154), which returns 0.1243. Since the 
p-value is greater than the significant level a = 0.05, we cannot reject Ho; 
that is, there is not enough evidence to show the 75th percentile of the systolic 
blood pressure is greater than 104. 


Let o be the standard deviation of tread life span of the new tires, then the 
null and alternative hypotheses are 


Hjp:0<2.5 vs H,: a> 2.5. 


The sample variance of the tread life spans of a sample of 20 new tires is 
calculated as 10.47185. To test the hypotheses in part (a), we calculate the 
value of the test statistic as 

>  (n-1)8?— (20=1) x 10.47185 


Since H, is 0 > 2.5, the RR is yy, > X21 = 32.85. Thus, the calculated 
value is not in the rejection region and the null hypothesis should not be 
rejected. The p-value is calculated by the R command 1-pchisq(31.83, 19), 
which gives 0.033. We conclude that the new design should be adopted. 
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(c) The normal Q-Q plot of the data is shown as 


Normal Q-Q Plot 


Sample Quantiles 
34 
I 


Theoretical Quantiles 


The Q-Q plot shows that the data do not come from a normal distribution. 
Thus, the above conclusion might not be reliable. 


15. In this problem, n = 36, S = 0.25, and o9 = 0.2. Thus the value of the test statistic 


1S 
» _ (n—1)S? _ (36-1) x 0.252 _ 


Since H, is o # 0.2, the RR is x7, > als = 53.20 or vy, < eer = ToT. 
Thus, the calculated value is in the rejection region and the null hypothesis should 
be rejected. The p-value is calculated by the R command 2*min(pchisq(54.69,35), 1- 
pchisq(54.69,35)), which gives 0.036. 


8.4 Precision in Hypothesis Testing 


1. (a) When a null hypothesis is rejected, there is risk of committing Type I error. 
(b) When a null hypothesis is not rejected, there is risk of committing Type II 
error 
2. (a) True 
(b) False 
(c) False 
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3. (a) To calculate Type I error, we have 


P(Type I Error) = P(Hp is rejected when it is true) 
“(20 
=P(XS 6p =025 n=20)= > ( Joasto.r5—* 
k=8 k 
because under this situation, the random variable XY has a binomial distri- 
bution with n = 20 and p = 0.25. This probability can be calculated using 
R command 1-pbinom(7,20,0.25), which gives us 0.1018 as the probability of 
Type I error. 


(b) We first calculate the probability of Type II error as 
P(Type II Error when p = 0.3) = P(Ap is not rejected when p = 0.3) 


i 
= P(X <8\p=03,0=20)= Oe ca 
(X <8ip ory 
because under this situation, the random variable X has a binomial distribu- 
tion with n = 20 and p = 0.3. This probability can be calculated using R 
command pbinom(7,20,0.3), which gives us 0.7723 as the probability of Type 
II error. Finally, the power is 1-0.7723 = 0.2277. 


(c) When n = 50 and rejection region is X > 17, the probability of Type I error 
can be found by the command 1-pbinom(16,50,0.25), which gives us 0.0983. 
The power at p = 0.3 can be found by the command 1-pbinom(16,20,0.3), 
which gives us 0.316. We found that as the sample size increases, we have a 
smaller probability of Type I error and more power. 


4. (a) The Rcommand 1-pwr.t.test(36, (2-1)/4.1, 0.05, power=NULL, “one.sample”, 
“greater” )Spower returns 0.583 as the probability of Type II error when the 
true concentration is 2 ppm. 


(b) The R command pwr.t.test(n=NULL, (2-1)/4.1, 0.05, 0.99, “one.sample”, 
alternative= “greater” ) returns a sample size of 266.46, which is rounded up to 
267. 


5. In this problem, the hypotheses tells us that wo = 8.5. We require that the proba- 
bility of delivering a batch of acidity 8.65 should not exceed 0.05, thus, jug = 8.65 
and The type II error is 0.05, therefore the power is 0.95. We also know that the 
standard deviation from a preliminary study is 0.4, therefore, S,, = 0.4. Com- 
bining this information, we use the commands library(pwr); pwr.t.test(n=NULL, 
(8.65-8.5)/0.4, 0.05, 0.95, “one.sample”, alternative= “greater”), which returns a 
sample size of 78.33 and is rounded up to 79. 


6. (a) In this test, the testing statistic is 


7 = P— Po 
(0) 
po(1 — po)/n 
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and because of H, : p > 0.2, the rejection region is Zy, > Za. Thus, the 
probability of Type II error at py = 0.25 is 
p= 02s) 


6(0.25) = P(Type II error|p = 0.25) = P P= Po ae. 
Po(1 — po)/n 


= P ( < po + ZV po(1 = po)/n| p = 0.25) 


-0 (tava wear) 
p(l— p)/n 


For the calculation, we use the command 
pnorm((0.2+qnorm(1-0.01) *sqrt(0.2*0.8/42)-0.25) /sqrt(0.25*0. 75/42)) 
and it gives 0.9193 as the probability of Type II error. 

(b) To achieve power of 0.3 at pa = 0.25 while keeping the level of significance 
at 0.01, we should use the commands library(pwr); h=2*asin(sqrt(0.25))- 
2*asin(sqrt(0.2)); pwr.p.test(h, n=NULL, 0.01, 0.3, alternative= “greater” ). 
The code returns a sample size of 225.85 and is rounded up to 226. 


7. (a) In this test, the testing statistic is 


P— Po 
Po(1 — po)/n 


Liye = 


0 


and because of H, : p < 0.05, the rejection region is Zy, < —Zq. Thus, the 
probability of Type II error at pz = 0.04 is 
C= 00s) 


6(0.04) = P(Type II error|p = 0.04) = P ae > Za 
po(1 — po)/n 


=P(B> po po(l — po)/n| p = 0.04) 
-1-9(2=3 oho eh 


p(1—p)/n 


For the calculation, we use the command 
1-pnorm((0.05-qnorm(1-0.01) *sqrt(0.05*0.95/500)-0.04)/sqrt(0.04 *0.96/500)) 
and it gives 0.926 as the probability of Type II error. 

(b) To achieve power of 0.5 at pg = 0.04 while keeping the level of significance at 
0.01, we should use the following commands library(pwr); h=2*asin(sqrt(0.04))- 
2*asin(sqrt(0.05)); pwr.p.test(h,n=NULL, 0.01, 0.5, alternative=“less”). The 
code returns a sample size of 2318.77 and is rounded up to 2319. 
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Comparing Two Populations 


9.2 Two-Sample Tests and ClIs for Means 


1. Let p, be the mean fatigue crack growth in aluminum with thickness of 3 mm and 
[2 be the mean fatigue crack growth in aluminum with thickness of 15 mm. 


(a) The null and alternative hypotheses are 
Ay: fy =e VS Hg: pa F be. 


In order to use (9.2.14) for the testing problem, we need to make sure that the 
assumption of = 0% holds. According to (9.2.9), there is 


max(S?,53) 15533? 


a =e > 2, 
min($?,92) 3954? 


Thus, the assumption 0? = 03 does not hold and, consequently, (9.2.14) cannot 
be used. 


(b) We should use the statistic given in (9.2.15), with 


— X,—X_ _ 160592 — 159778 


Ti. == = 0.3275. 
39542, 155332 
a 
The degrees of freedom are 
(2 sty" [a2 + wnat)" 
_ m tm = S = = [47.12] = 47 
C= (S2/ni)?_, (S3/n2)2 | | (39542/36)? (15533 /42)? =| : J= . 


mi-l '  ng—l 36-1 42-1 

From R command qt(1-0.025,47) we get to.o25,47 = 2.012, thus we should 
not reject Hp at 5% level. The p-value is calculated by the command 2*(1- 
pt(0.38275,47)), which gives us 0.745. Since both the two samples have size 
greater than 30, there is no additional assumption for the validity of the test 
procedure. 
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(c) The 95% CI for the difference in the two means is given by (9.2.10): 


2 7 G2 2 
X, — Xo 4 tresoy/— + — = 160592 — 159778 + tuz0.05/2 
My Ng 


ee 15533? 
36 °—iCiaD’ 


which gives the CI (-4187, 5815). Since 0 is included in the CI, Hp could not 
be rejected. 


2. Let ju be the average penetration for material A and slg be the average penetration 
for material B. 


(a) The null and alternative hypotheses are 
Aly: py — pe < 0.1 vs Ag: py — pg > 0.1. 


In order to use (9.2.14) for the testing problem, we need to make sure that the 
assumption 07 = 03 holds. According to (9.2.9), there is 


max(S7,53) 0.19? 


= = 1.41 < 2. 
mim(S;,$-) 016? 


Thus, the assumption 0? = 0% holds and, consequently, (9.2.14) can be used. 
(b) We should use the statistic given in (9.2.14). We first calculate the pooled 
estimate of the variance as 
gz — (m= 1S? + (m2 — 1S} _ (42 — 1) x 0.19? + (42 — 1) x 0.16? 
P ny tng —2 42+ 42—2 


= 0.03085 


and the test statistic is 


EV X,— X_— Ao 0.49 — 0.36 — 0.1 


Ho ~ = = ().7827. 
ve (4 a 2) 0.03085 x ($+) 
Pp ny 


The degree of freedom is vy = ny + ng — 2 = 82. From R command gt(1- 
0.05, 82), we get to.o5,82 = 1.6636, thus we should not reject Ho at 5% level. 
The p-value is calculated by the command 1-pt(0.7827, 82), which gives us 
0.218. Since both the samples have size greater than 30, there is no additional 
assumption for the validity of the test procedure. 


(c) The 95% CI for the difference in the two means is given by (9.2.10): 


vx L 4c 1 a 
X4 ~~ Kektonn| (.- + ~ = 0.49 —_ 0.30tan ay 0.0808 x (s + =) 


which gives the CI (-0.0462, 0.1062). 
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3. Let pu, be the average delivery time for the standard route and pg be that of the 
new route. 


(a) The null and alternative hypotheses are 
Ao: fi S fa vs Ha: fy > pe. 


In order to use (9.2.14) for the testing problem, we need to make sure that the 
assumption 0? = 03 holds. According to (9.2.9), there is 


max(ST,S3) _ 20.387 


— = 1.70 < 2. 
min($7,.57) 15.624 


Thus, the assumption 0? = 03 holds and, consequently, (9.2.14) can be used. 


(b) We should use the statistic given in (9.2.14). We first calculate the pooled 
estimate of the variance as 


m —1)S2+(n2—1)83 (48 —1) x 20.38? + (84 — 1) x 15.62? 


= 344.6584 


( 
5? = 


and the test statistic is 


ee 432.7 — 403.5 


: 3 (2+2) [344.0584 x (4 + 4) 


= 7.01684. 


ny n2 


The degrees of freedom are vy = ny + ng — 2 = 80. From R command g¢t(1- 
0.05, 80), we get to.o5,30 = 1.664. Since THY > to.o5,80, we should reject Ho at 
5% level. The p-value is calculated by the command 1-pt(7.01684, 80), which 
gives us 3.292957 x 10-1°. Since both the samples have size greater than 30, 
there is no additional assumption for the validity of the test procedure. 


(c) The 99% CI for the difference in the two means is given by (9.2.10): 


5 = 1 1 1 1 
Xi- Xktona] (— + ~| = 432.7 — 103.5:tguy| 4.0584 x (3 + ui) 


which gives the CI (18.22, 40.18). 

(d) The command t.test(duration~route, data=dd, var.equal=T, alternative= “greater” ) 
gives the test statistic value t = 7.0161 and p-value = 3.304 x 10-1°. The com- 
mand t.test(duration~route, data=dd, var.equal=T, conf.level=0.99) gives the 
99 percent confidence interval (18.21940, 40.18477). These are essentially the 
same as in part (a) and (c), considering the round-off errors. 


4. Let py, be the mean strength of new concrete at -8C and jz be the mean strength 
of new concrete at 15C. 
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(a) The null and alternative hypotheses are 


Ho * py = fa VS Het py F fle: 


In order to use (9.2.14) for the testing problem, we need to make sure that the 
assumption 07 = 03 holds. According to (9.2.9), there is 
max(S7,53) 4.92? 
min(S?, 53) 3.14? 


= 2.455 < 5. 


Thus, the assumption o? = 0% holds and, consequently, (9.2.14) can be used. 


(b) We should use the statistic given in (9.2.14). We first calculate the pooled 
estimate of the variance as 


ge — (= VST + (me = 183 _ (9-1) x 3.14? + 9-1) x 4.92? 


= =) Aas 
Pp Ny tn. —2 9+9-2 


and the test statistic is 


EV _ 62.01 — 67.38 


Ao 
S /i7.033 x (+ (242) 
(4 2) 


The degrees of freedom are vy = nj +n_—2 = 16. From R command gt(1-0.05, 
16), we get too516 = 1.746. Since THY | > to.os,16, we should reject Ho at 
10% level. The p-value is calculated by the command 2*pt(-2.76, 16), which 
gives us 0.01394434. In order to make the test procedure valid, we need the 
assumption that the samples are from a normal distribution. 


(c) The 90% CI for the difference in the two means is given by (9.2.10): 


_ = 1 1 ao | 
X1 = Ratton] 9 (— + ~| = 62.01 — 67. 8tgp| 1708 x (; + 5); 
1 2 


which gives the CI (-8.76668, -1.97332). 


(d) The command t.test(cs$Temp1, cs$Temp2, var.equal=T, conf.level=0.9) gives 
the test statistic value t = —2.7628, p-value = 0.01386, and the 90 percent 
confidence interval (-8.772460, -1.978651). These are essentially the same as 
in part (a) and (c), considering the round-off errors. 


= =2, 10. 


5. Let fl, be the mean ultimate tensile strength (UTS) of holed specimens of 7075-T6 
wrought aluminum and plz be that of notched specimens. 


(a) The null and alternative hypotheses are 


Alo: fa — fe < 126 vs) Ag: pa — Me > 126. 
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In order to use (9.2.14) for the testing problem, we need to make sure that the 
assumption 07 = 03 holds. According to (9.2.9), there is 


max(S?,S3) 52.12 


= = 2.02 < 3. 
min( 57,55) 25.83 


Thus, the assumption a7 = o3 does hold and, consequently, (9.2.14) can be 
used. 


(b) We should use the statistic given in (9.2.14). We first calculate the pooled 
estimate of the variance as 
(in. 1) G18) (i 1) x 4) x 88 
ny tn—2 7 15+15-2 


Ss? = = 38.975 


P 


and the test statistic is 


= 4.417. 


TEV — Xy—X2—Ao _ 557.47 — 421.40 — 126 
0 
(242) 138.975 x (+ 4) 


P\ ni n2 


The degrees of freedom are vy = n, + ng — 2 = 28. From R command g¢t(1- 
0.05, 28), we get to.o5,28 = 1.70. Since The | > to.05,28, we should reject Ho at 
5% level. The p-value is calculated by the command 1-pt(4.417, 28), which 
gives us 6.81 x 10~°. In order to make the test procedure valid, we need the 
assumption that the samples are from a normal distribution. 


(c) The 95% CI for the difference in the two means is given by (9.2.10) 


ee 1 1 1 
X1 — Xottya/o4/ 52 (| — + — } = 557.47 — 421.40 4t0¢ 0.05/24 | 38.975 x | = + 

; P\ ni. ne ’ 15 
which gives the CI (132.19, 139.95). 


(d) The command t.test(uts$UTS_Holed, utsS$UTS_Notched, mu=126, var.equal=T, 
alternative= “greater” ) implements that test statistic in (9.2.14) and it gives 
the test statistic value t = 4.4159 and p-value = 6.83 x 10-°. The command 
t.test(utsSUTS_Holed, utsSUTS_Notched, mu=126, alternative= “greater” ) im- 
plements that test statistic in (9.2.15) and it gives the test statistic value 
t = 4.4159 and p-value = 8.377 x 107°. 


6. Let 4, be the mean mean full weight by machine 1 and jz be that by machine 2. 
(a) The null and alternative hypotheses are 
Ho: fi = po vs Aa: pF pe. 
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In order to use (9.2.14) for the testing problem, we need to make sure that the 
assumption 0? = 03 holds. According to (9.2.9), there is 


max(ST,53) 29.30 


= = LI < 
min($?,53) 26.24 


Thus, the assumption 0? = 0} does hold and, consequently, (9.2.14) can be 
used. 


(b) To use the statistic given in (9.2.14), we first calculate the pooled estimate of 
the variance as 


(ny — 1)$2 + (ng —1)82 (12 — 1) x 29.30 + (12 — 1) x 26.24 


Sp = = = 27.77 
. my +N, —2 1212 =9 
and the test statistic is 
nv. X1—X2 966.75 — 962.33 


= 2.0545. 


LS = 
ne (A+2) V2rTTx (e+) 


The degrees of freedom are vy = n1+n2—2 = 22. From Rcommand qt(1-0.05/2, 
22), we get to.025,22 = 2.074. Since ITH | < to.025,22, we should not reject Ho 
at 5% level. The p-value is calculated by the command 2*(1-pt(2.0545,22)), 
which gives us 0.052. 


To use the statistic given in (9.2.15), we calculate 
X, — Xe _ 966.75 — 962.33 


S239 29.30 1 26.24 
mi | ne 12 12 


7S = 2.0545. 


(S?/m)? , (S3/n2)? 
ny—-1 : no—-1 


(29.30/12)? | (26.24/12)? 
12-1 ' 12-1 


29.30 | 26.24)? 
-| (70 7 2) | = er0)= 21 


From R command gt(1-0.05/2, 21), we get to.o25,21 = 2.08. Since IT? | < 
to.o5,21, we should not reject Ho at 5% level. The p-value is calculated by the 
command 2*(1-pt(2.0545,22)), which gives us 0.053. 


In order to make the test procedure valid, we need the assumption that the 
samples are from a normal distribution. 


(c) The 95% CI for the difference in the two means is given by (9.2.10) 


7 s 1 1 1 ‘i 
X ~~ Katto] (— + x) = 966.75 —_ 962.3 aayy 27.7 x (5 + 3): 


which gives the CI (-0.04164, 8.8816). Since 0 is included in the CI, we would 
not reject Ho at 5% level. The result is the same as in part (b). 
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7. Let p; be the proportion of having the number four in the first post-decimal digit 
reported by firms with analyst coverage and py be that reported by firms with no 
analyst coverage. 


(a) The null and alternative hypotheses are 
Hy: pi=p2. vs Ha: pi # pe. 
We use the procedure given in (9.2.20) to test the hypotheses. We calculate 


692 A 1182 : 692 + 1182 
= Ronn? P= ee an P= eae AACE: 
9396 13985 9396 + 13985 


The test statistic is calculated as 


Pi 


Ze = ae = —3.001. 


* af 1 1 
i/o =D) (4 7 2) 
The critical value is zo.91/2 = 2.576. Since IZi0| > %.01/2, we should reject 


Ho at 1% level. The p-value is calculated by 2*pnorm(-3.001), which returns 
0.0027. 


(b) Using (9.2.11), the 99% CI for p; — pz is given by 


, D1 (1 — p p2(1 — p 
rir comnl pi) p2(1 — po) 
Ny ng 


and the calculation gives (-0.0201, -0.0017). 


(c) We use the R command prop.test(c(692, 1182), c(9896, 18985), correct=F, 
conf.level=0.99) and it returns 0.0027 and (-0.0201, -0.0017) for the p-value 
and 99% CI, respectively. 


8. Let p,; be the success rate of tears greater than 25 millimeters and po be that of the 
tears less than 25 millimeters. 


(a) The null and alternative hypotheses are 
Ao: pi =p2 v8 Ha: pi # po. 
We use the procedure given in (9.2.20) to test the hypotheses. We calculate 


10. 22 ; 10+22 32 
—_ _— n SS 
13° 92 39° 9° P“i3430 48 


The test statistic is calculated as 


Pi= 


— m_? = 4965. 


Zi 
oa) (£42) 
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The critical value is 2.1/2 = 1.645. Since \Zi,| < 2,.01/2, we should not reject 
Ho at 1% level. The p-value is calculated by 2*pnorm(-1.265), which returns 
0.2059. 


(b) Using (9.2.11), the 90% CI for p; — pz is given by 


ee oe D1 (1 — p D2(1 — p 
pr —fat canny] Pi) , Bal po) 


Ny UD) 


and the calculation gives (-0.4118, 0.0562). 


(c) We use the R command prop.test(c(10, 22), c(18, 30), correct=F, conf.level=0.9) 
and it returns 0.2059 and (-0.4118, 0.0562) for the p-value and 90% CI, respec- 
tively. 


9. Let p; be the proportion of correctly identifying the signal from the right and pz be 
the proportion of correctly identifying the signal from the left. 


(a) The null and alternative hypotheses are 
Ho.) Di = Pa We og 2 Dir FH Das 


We use the procedure given in (9.2.20) to test the hypotheses. We calculate 


85 87 85 + 87 
Dy = —~ = 0.85 Dy = ——~ = 0.87 d p= —— = 0.86. 
E> 100 YES 100 tS © — 00 a00 
The test statistic is calculated as 
D1 — D 0.85 — 0.87 
Zi, = ae = 0.4076. 


i/o — /) (2 + +) 10.86 x (1 =0.86) x(t42) 


The critical value is zo.91/2 = 2.576. Since Zio < 2.01/2, we should not reject 
Ho at 1% level. The p-value is calculated by 2*(1-pnorm(0.4076)), which 
returns 0.6836. 


(b) Using (9.2.11), the 99% CI for p; — pz is given by 


F ‘ pi (1 — p D2(1 — p 
phat conn| Pi) , Bal po) 
Ny ng 


and the calculation gives (-0.1463, 0.1063). 


(c) We use the R command prop.test(c(85, 87), c(100, 100), correct=F, conf.level=0.99) 
and it returns 0.6836 and (-0.1463, 0.1063) for the p-value and 99% Cl, respec- 
tively. 
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10. Let p, be the proportion of type A car sustained no visible damage in 10-mph crash 
test and p2 be the proportion of type B car sustained no visible damage in 10-mph 
crash test. 


(a) The null and alternative hypotheses are 
Ao: pi Sp, vs Ha: pi < po. 
We use the procedure given in (9.2.20) to test the hypotheses. We calculate 


19. 22 a pa 19+ _ 4! 
=> 5 an => => - 
a Pp 35485 170 


The test statistic is calculated as 


ee Pi P2 = —0,5378. 


Zio 
yea») (4+ 2) 


The critical value is 29.95/2 = 1.96. Since Zit, > —2%0.05/2, we should not reject 
Hy at 5% level. The p-value is calculated by pnorm(-0.5378), which returns 
0.2954. 


(b) Using (9.2.11), the 95% CI for p; — pz is given by 


-_ & pil —p po(1— p 
py — po = cara] if i) + 2 2) 
Ny Ne 


and the calculation gives (-0.1431, 0.0726). 


(c) We use the R command prop.test(c(19, 22), c(85, 85), alternative = “less”, 
correct=F’) and it returns 0.2953 for the p-value. We use the R command 
prop.test(c(19, 22), c(85, 85), correct=F, conf.level=0.9) and it returns (- 
0.1431, 0.0726) for the 90% CI. 


9.3. The Rank-Sum Test Procedure 


1. Let fig be the median rainfall of the seeded clouds and fic be that of the control 
clouds. The null and alternative hypotheses are 


Ho: tis < pc Vs fg > ic. 
The commands 


CSD = read.table(“CloudSeedingData.tat”, header = T); 
wilcox. test(CSD$Seeded, CSD$Control, alternative = “greater”, conf.int=T) 
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give a p-value of 0.007. So Ho should be rejected at level 0.05. The command 
wilcox. test(CSD$Seeded, CSD$Control, conf.int=T) gives the 95% CI for the me- 
dian of the difference in rainfall between a seeded and an unseeded cloud as (14.10, 
237.60). 


2. (a) Let fig be the median pollutant concentration on the east side of the lake and 
ftw be that of the west side. The null and alternative hypotheses are 


Ao: fir= fiw vs fiz F fw. 
The test procedure (9.3.5) is not recommended for this data set because the 


data set sizes for east side and west side are both less than 8. 
(b) The commands 


& = €(186, 2.60.18, 2:41, 1.87, 2.89): 
W= c(1.70, 8.84, 1.18, 4.97, 0.86, 1.93, 3.86); 
wilcox.test(E, W, conf.int=T) 
give the p-value 0.945 and (-1.98, 1.74) as the 95% CI for the median of the 
difference between a measurement from the eastern part of the lake and one 
from the western part. Thus, we should not reject Ho at 0.1 level and conclude 


that there is not enough evidence to show that the pollutant concentration on 
the two sides of the lake is significantly different. 


3. (a) Let fig be the median total strain amplitude of spheroidal graphite (SG) cast 
iron and fic be the median total strain amplitude of compacted graphite (CG) 
cast iron. The null and alternative hypotheses are 


Ho: jis=ftc vs ps # ic. 
To conduct the test procedure in (9.3.5), we use the commands 


B= e105, 17, 02,37, 22, 10 12,44, Go) nl = lengths): 

C= c(90, 50, 30, 20, 14, 10, 60, 24, 76); n2 = length(C); N=n1+n2; 
t=c(S, C); r=rank(x); wi=sum(r[1:n1]); w2=sum(r[n1+1:n2)); 
s2r=sum((r-(N+1)/2) **2)/(N-1); 
z=(w1/n1-w2/n2)/sqrt(s2r*(1/n1+1/n2)); z 
We get the z-value as Zy, = 0.088. The p-value is given by 2*(1-pnorm(0.088)) 
and it returns 0.93. Thus, we should not reject Ho at level of significance 
0.05 and conclude that there is not enough evidence to show that the total 


amplitude strain properties of the different types of cast iron significantly 
different. 


(b) The commands wilcoz.test(S, C, conf.int=T) give the exact p-value 0.9648 
and (-33.00, 38.00) as the 95% CI for the median of the difference between a 
measurement from SG and one from CG cast iron. 
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4. (a) We use commands 


FL=read.table(“FemurLoads.tat”, header=T); 
boxplot(FL$X2800lbs, col= “red” ); 
boxplot(FL$X8200lds, col= “green” 


to plot the boxplots in red and green color, respectively. They are shown 
below. 
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There is no outlier in the boxplots and the plots look symmetric. Therefore, 
it seems that the normality assumption is tenable. 


(b) Let fi, be the median Femur loads for type 1 vehicles and ji; be that for type 
2 vehicles. The null and alternative hypotheses are 


Alo: fii = fla VS fl F fitz. 
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The commands wilcoz.test(FLSX2800lbs, FL$X3200lds, conf.int=T) give the 
exact p-value 0.4483 and (-469, 270) as the 95% CI for the median of the 
difference between a femur load measurement from a type 1 vehicle and one 
from a type 2 vehicle. Thus, we should not reject Ho at the level 0.1 and 
conclude that there is not enough evidence to show that the Femur loads for 
the two types of cars significantly different. 


5. For the t-test, the command dd=read.table(“DriveDurat.tat”, header=T); 


t.test(duration~route, data=dd, var.equal=T, alternative= “greater”) gives p-value = 
3.304 x 10-1°. For MWW rank-sum test, we use the command 


wilcoz.test(duration~route, alternative= “greater”, data=dd) and it gives the p- 
value 1.19 x 10-8. The command t.test(duration~route, data=dd, var.equal=T, 
conf.level=0.9) gives the 90 percent confidence interval (22.27571, 36.12846) cor- 
responding to the t-test and the command wilcoz.test(duration~route, conf.int=T, 
conf.level=0.9, data=dd) gives the 90 percent confidence interval (22.80, 37.70) cor- 
responding to the MWW rank-sum test. 


9.4 Comparing Two Variances 


1. Let o2 be the variance of total strain amplitude of spheroidal graphite (SG) cast 
iron and o% be that of compacted graphite (CG) cast iron. The null and alternative 
hypotheses are 


i ee 2 2 
Hj:03=0G vs 05 #9¢. 


To conduct the Levene’s test, we use the commands 


& = e108, 77, 52,27, 22,17, 12, 1, 65)? ni = lengths) 
C= c(90, 50, 30, 20, 14, 10, 60, 24, 76); n2 = length(C); 
library(lawstat); x = c(S, C); ind = c(rep(1, n1), rep(2, n2)); levene.test(x, ind). 


The code gives us the p-value 0.7887, thus we cannot reject Hop at the significant 
level 0.05. 


2. Let o? be the variance of fatigue crack growth in aluminum with thickness of 3 mm 
and o3 be that of aluminum with thickness of 15 mm. The null and alternative 
hypotheses are 

Hoi; =, vs o7 #65. 


To conduct the F-test, we calculate 


S2 3954? 
Fy, =—= = 0.0648 and 


= = 15.43. 
0 82 155332 Fup 
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The degrees of freedom for Fy, are 1) = ny — 1 = 35 and 42 = ng -1= 41. Thus, 
F’35,41,0.025 = 1.895 by the command gf (0.975, Os 41) and F'41,35,0.025 = 1.98 by the 
command @f(0.975, 41, 35). Since 1/F, > F'11,35,0.025, we Should reject Ho at 5% 
level. To calculate the p-value, we use 2*min(1-pf(0.0648, 35, 41), 1-pf(15.48, 41, 
35)) and it gives 4.93 x 10718. In order to make the test valid, we need to assume 
that the data are from normal populations. 


3. Let o? be the variance of delivery time for the standard route and 03 be that of the 
new route. The null and alternative hypotheses are 


a ee 2 2 
Ay tel = O54 Ve 0, Os. 


To conduct the F-test, we use the commands 
dd=read.table(“DriveDurat.tat”, header=T); var.test(duration~route, data=dd). 


The code gives us the p-value 0.1108, thus we cannot reject Ho at the significance 
level 0.05. In order to make the test valid, we need to assume that the data are 
from normal populations. 


9.5 Paired Data 


1. Let ja and pp be the average time required to parallel park type A and type B 
cars, respectively. The null and alternative hypotheses are 


Ho: pa-—pep=O0 vs pa—pp #0. 
(a) To perform the paired T-test, we use the following commands: 


A=c(19.0, 21.8, 16.8, 24.2, 22.0, 84.7, 28.8); 
B=c(17.8, 20.2, 16.2, 41.4, 21.4, 28.4, 22.1); 
t.test(A, B, paired = T) 


The commands return 0.78 as the p-value, thus we should not reject Ho at 
significant level 0.05. In order to make the test procedure valid, we need the 
assumption that the differences should be normally distributed. To check this 
assumption, we plot the boxplot of the differences by the command boxplot(A- 
B), and it is shown on the next page. 
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This plot shows that there are two outliers and, thus, the normal assumption 
is suspect. 


(b) To apply the signed-rank test, we use the command wilcoz.test(A, B, paired = 
T) and it gives us p-value 0.271. Thus we should not reject Ho at significance 
level 0.05. 


2. Let ji1 and jz be the mean of the first test and of the second test, respectively. The 


null and alternative hypotheses are 


Ao: pi — 2 =O vs py — pe #0. 


Whe commands (1 =<c/12, 1.4, 15, 14)-1-7, 16, 14, 13) 2 = al 4, 1%, 18, 
1.3, 2.0, 2.1, 1.7, 1.6); t.test(t1, t2, paired = T) can be used to solve this problem. 


(a) The returned p-value is 0.0103, which is less than 0.05. Thus, we should reject 
Hp and conclude that the specialty steel manufacturer should adopt the second 
method. 


(b) The 95% CI for the mean difference is (-0.3569, -0.0681). 


Let 4, and pz be the average percent of soil passing through the sieve for the two 
locations. Then, the null and alternative hypotheses are 


Ao: py — 2 =O vs py — pe #0. 


(a) We use commands SDN = read.table(“SoilDataNhi.tat”, header = T); at- 
tach(SDN) to read data. The command t.test(Soil1, Soil2, paired = T) gives 
us 0.037 as the p-value and (-5.490, -0.185) as the 95% CI for the difference of 
the population means. Thus, we should reject Ho at the significant level 0.05. 
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(b) To use the signed-rank procedure, we use the command wilcoz.test(Soil1, Soil2, 
paired = T, conf.int = T) and it gives us 0.037 as the p-value and (-5.450, - 
0.100) as the 95% CI for the difference of the population means. Comparing 
to part (a), we can see that the p-values and Cls from the two procedures are 
similar. 


(c) The command for the t-test is t.test(Soil1, Soil2) and it gives us 0.215 as the 

p-value and (-7.364, 1.690) as the 95% CI for the difference of the population 
means. Thus, we should not reject Ho at the significance level 0.05. 
The command for signed-rank procedure is wilcoz.test(Soil1, Soil2, conf.int=T) 
and it gives us 0.340 as the p-value and (-6.900, 2.300) as the 95% CI for the 
difference of the population means. Thus, we should not reject Ho at the 
significance level 0.05. 


Clearly, these results are very different from those in parts (a) and (b). 


4. We use commands LifeTime = read.table(“McycleTiresLifeT. tat”, header = T); 
attach(LifeTime) to read data. 


(a) Let ju; and pz be the mean lifetimes of Brand 1 and Brand 2 tires, respectively. 
The null and alternative hypotheses are 


Ao: pi — pe =0 vs py — po #0. 


To perform the paired t-test with the 90% CI, we use the command t.test(Brand1, 
Brand2, paired=T, conf.level=0.9) and it returns 0.0986 as the p-value and 
(4.2661, 1732.4839) as the 90% CI for the mean difference in lifetime. Since 
the p-value is greater than 0.05, we should not reject Ho at 5% level. In order 
to make the test procedure valid, we need the assumption that the differences 
should be normally distributed. 


(b) To use the signed-rank procedure, we use the command wilcoz.test(Brand1, 
Brand2, paired = T) and it gives us 0.1094 as the p-value. Since the p-value 
is greater than 0.05, we should not reject Ho at 5% level. 


5. In this table, we have n = 1825+ 3+ 138+ 59 = 1400, Yo = 3, and Y3 = 13, 
g2 = 3/1400, and gz = 13/1400. According to (9.5.10) McNemar’s test statistic is 
Y2 — Y3 a= 15 
MN = = =- 
VY¥o+¥3 V3413 


and, according to (9.5.9), the paired t-test statistic is 


Tx, = ————2 4 = —2.505. 
V (G2 + 4 — (G2 — §3)?)/(n — 1) 
Because of the large sample size we use Zq/2 = 20.025 = 1.96 as the critical point. 
Since both |MN| and |Ty,| are greater than 1.96, we reject the null hypothesis and 
conclude that the two algorithms have the same error rates. 
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6. In this table, we have n = 260, Y2 = 62, and Y3 = 95, g. = 62/260, and G3 = 95/260. 


According to (9.5.10) McNemar’s test statistic is 


w= 62 — 95 


MN= = = —2.634 
VY¥2+Y3 V62495 
and, according to (9.5.9), the paired t-test statistic is 
Tuy = as = —2.664. 


°  V/(@2 + @3 — (h — &)?)/(n— 1) 


Because of the large sample size we use Zq/2 = 20.025 = 1.96 as the critical point. 
Since both |MN| and |77,,| are greater than 1.96, we reject the null hypothesis and 
conclude that there was a change in voter attitude. 
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10.2 Types of k-Sample Tests 


1. (a) Let pi, 2, 3, and jug be the average tread lives of four types of truck tires, 
respectively. The null and alternative hypotheses are 


Alo: fa = fo = 3 = fla =OVS)~=6~A,:: Ho is not true. 


We use the R command TL = read.table(“TireLifel Way.txt”, header= T) to 
read the data and use the commands fit = aov(TL $values~as.factor(TL$ind)); 
anova(fit) to get the p-value as 0.1334, which is greater than the given signifi- 
cant level 0.1. Thus, the null hypothesis should not be rejected. 

In order to make the test procedure valid, we need the assumptions that the 
samples are independent and from normal populations, and the population 
variances are equal. 


(b) 


(i) The contrast is 0 = ("1+ M2) /2— (3+ 4)/2. To compare the two brands, 
the related null and alternative hypotheses are 


Ay G=). vs 4,100, 


(ii) To test the hypotheses, we use the commands 


attach(TL); sm = by(values, ind, mean); svar = by(values, ind, var); 
t = (sm[1]+sm[2/-sm[3]-sm[4])/2; st = sqrt(mean(svar)*(1/4*2/7*2)); 
t -qt(0.95, 24) *st; t+qt(0.95, 24)*st; TS = t/st; 2*(1-pt(abs(TS),24)). 
The test statistic is given Ty, = —2.01 and the corresponding p-value is 
0.056, which is less than the given significant level 0.1. Thus, the null 
hypothesis Hy should be rejected. The 90% CI is given as (—1.31, —0.11). 
(iii) The outcome of the test for the specialized contrast is NOT in agreement 
with the outcome of the test for the overall null hypothesis in part (a) 
because 6 ~ 0 means that Hp : wy = fe = M3 = La is not true. The t-test 
for a specialized contrast is more powerful than the F test for the equality 
of all means. 
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2. (a) The contrast is 6 = ({41 + 3) /2 — U2. To test the hypotheses 
Hg tt@=V' ws 122040, 


we use the commands 


fl = read.table(“Flammability.tzt”, header = T); attach(fi); 
sm=by(BurnL, Material, mean); sv=by(BurnL, Material, var); 
t = (sm[1]+sm[3])/2-sm[2]; st = sqrt(mean(sv) *(1/4*2/6+1*1/6)); 
US = t/st; 2*(1-pthabs( 15), 15)). 
The commands give a p-value of 0.104. Thus, we should not reject the null 
hypothesis at significant level 0.05. 


(b) The R commands given in the hint return a p-value of 0.035. Thus, we should 
reject the null hypothesis that the combined populations of materials 1 and 3 
is the same as that of material 2. 


3. (a) Let 1, 2, 3, and 4 be the average REM (rapid eye movement) sleep time 
of four concentrations of ethanol, respectively. The null and alternative hy- 
potheses are 


Ag : fy = fo = 3 = fg «VS §= A: Ao is not true. 


Using hand calculations, we have 


-~ X,+Xo+X34+ X. 2 1.54 + 47.92 2. 
a it 2 3+ 4_ 79 84+ 61.5 " 7.92+3 16 ge gue 


and 


4 
SSTr =) ni(X; — X)? = 5 x (79.28 — 55.375)? +5 x (61.54 — 55.375)? 
i=1 


+5 x (47.92 — 55.375)? +5 x (32.76 — 55.375)? = 5882.358, 


om SSTr  5882.358 
MSTr = 5 = =~ = 1960.786. 


Finally, 


MSTr — 1960.786 
—— = = 21.095. 
Te N\iSE 92.95 


The p-value is calculated by 1-pf(21.095,3, 20-4), which returns 8.318 x 10~°. 
The null hypothesis should be rejected. 


In order to make the test procedure valid, we need the assumptions that the 
samples are independent and from normal populations, and the population 
variances are equal. 
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(b) We use the R command REM = read.table(“SleepRem.txt”, header= T); at- 
tach(REM) to read the data, and use the commands fit = aov(values~as.factor(ind)); 
anova(fit) to get the ANOVA table below. 


DF SS MS F ey 
as.factor(ind)IM 3 5881.7 1960.58 21.093 8.322 x 10~° 
Residuals 16 1487.1 92.95 


Clearly, the p-value is 8.322 x 10~° and the null hypothesis should be rejected. 


(c) To test the assumptions of equal variances, we use the command 
anova(aov(resid(aov(values~ind))**2~ind)), 


and it returns a p-value of 0.621. This suggests that the assumption of equal 
variance is approximately valid. To test normality, we use the command 


shapiro.test(resid(aov(values~ind))), 


and it returns a p-value of 0.1285. This suggests that the normality assumption 
approximately holds. 


We use the commands fit = aov(values~ind); boxrplot(resid(fit)) to get the 
boxplot of the residuals and use the command plot(fit, which=2) to get the 
Q-Q plot for the residuals, shown below. 
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Normal Q-Q 


120 


Standardized residuals 


Theoretical Quantiles 
aov(values ~ ind) 


These plots also suggest that the normality assumption is approximately sat- 
isfied, in agreement with the Shapiro-Wilk test p-value. 


(a) Use the commands ranks=rank(values); rms=by(ranks, ind, mean) to get the 
rank averages as 17.6, 12.6, 7.8, and 4. Thus, the Kruskal-Wallis statistic is 
calculated as 


k 2 
12 - N+ 12 ' 
Pee ON a ee, = 176 = 21/2 
K > m(R , ) Hy l(5(17-6 — 21/2) 


+ 5(12.6 — 21/2)? + 5(7.8 — 21/2)? + 5(4 — 21/2)?)] = 14.90857. 
The degrees of freedom are k— 1 = 3. Thus, the p-value can be found from the 


command 1-pchisq(14.90857,3), which gives 0.001896473. Since the p-value is 
less than the given significant level, the null hypothesis should be rejected. 


To make the test valid, we need the assumption of continuous population. 


(b) The command gives the Kruskal-Wallis statistic as 14.9086 and the p-value 
as 0.001896, which is less than the given significant level and thus, the null 
hypothesis should be rejected. 


(a) Let 41, 2, and zz be the average oxygen diffusivity at three mole fraction of 
water levels. The null and alternative hypotheses are 


Ao: M1 = f2 = M3 «VS HA: Ao is not true. 
The complete ANOVA table is given on the next page 
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DF Sum Sq Mean Sq F-Value 
Treatment 2 0.019 0.0095 0.009179113 
Residuals 24 24.839 1.034958 


(b) The critical value Fy_1,y—%,q can be calculated by qf(0.95, 2, 24), which gives 
3.40. Since Fy, < FR-1,n—k,a, we Should not reject Ho. 


(c) The p-value is computed as 1-pf(0.009179113, 2, 24), which returns 0.9908664. 
Thus, the null hypothesis should not be rejected at level 0.05. 


(a) Let 1, 2, 3, and 44 be the average pore size of carbon made at the four 
different temperatures. The null and alternative hypotheses are 


Ho: fi = fie = bg = ta VS Hy: Ho is not true. 
Using hand calculations, we have 


= X1+ Xo * X3+ X4 2 7.43 + 7.24 + 6.66 + 6.24 — 6.3995 


and 
4 

SSTr = So nil Xi = a = 5 x (7.43 — 6.8925)? +5 x (7.24 — 6.8925)? 
i=l 


+5 x (6.66 — 6.8925)? +5 x (6.24 — 6.8925)? = 4.447375, 


thus SSTr 4.447375 
MSTr = ~~ == = 1.482458, 
k-1 3 
MSE = S? = (m1 = 1S ++ + Ss 
ny Pe yp — Ei 
4x0. 
2 Ge es OMe ee 008 Gaeaes: 
16 
Finally, 


MSTr — 1.482458 
Fy. = a = 11.438651. 
"0 “MSE 0.129625 
The p-value is calculated by 1-pf(11.43651, 3, 20-4), which returns 0.000296. 


Thus, the null hypothesis should be rejected at significant level 0.05. 


In order to make the test procedure valid, we need the assumptions that the 
samples are independent and from normal populations, and the population 
variances are equal. 


(b) We use the R command pc = read.table(“PorousCarbon.tat”, header= T) to 
read the data and use the commands fit = aov(pc$values~as.factor(pc$temp )); 
anova(fit) to get the F-value 11.437 and the p-value as 0.0002958, which is 
less than the given significant level 0.05. Thus, the null hypothesis should be 
rejected. 
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(c) To test the assumptions of equal variances, we use the command 


attach(pc); anova(aov(resid(aov(values~temp))**2~temp)) 


and it returns a p-value of 0.0656. This suggests that the assumption of equal 
variance is approximately valid. To test normality, we use the command 


shapiro.test(resid(aov(values~temp ))), 


and it returns a p-value of 0.8255. This suggests that the normality assumption 
approximately holds. 


We use the commands fit = aov(values~temp); boxplot(resid(fit)) to get the 
boxplot of the residuals and use the command plot/(fit, which=2) to get the 
Q-Q plot for the residuals, shown below. 
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100 


Standardized residuals 
0 
IL 
° 


I I I I T 
=2 -1 0 1 2 


Theoretical Quantiles 

aov(values ~ temp) 
These plots also suggest that the normality assumption is approximately sat- 
isfied, in agreement with the Shapiro-Wilk test p-value. 


(a) Using the commands attach(pc); ranks=rank(values); vranks=var(ranks); rms= 
by(ranks, temp, mean), we get Si = 34.55263. The rank averages are 15.9, 
14.7, 7.8, and 3.6. Thus, the Kruskal-Wallis statistic is calculated as 

k 


i ~ Weary 1 
KW = —— cee = 515.0 9172)" 
ae ( 2 ) 3455208 0° /?) 


+ 5(14.7 — 21/2)? + 5(7.8 — 21/2)? + 5(3.6 — 21/2)”)] = 14.71668. 


The degrees of freedom are k — 1 = 3. Thus, the p-value can be found from 
the command 1-pchisq(14.71668,3), which gives 0.0021. Since the p-value is 
less than the given significant level, the null hypothesis should be rejected. 


To make the test valid, we need the assumption of continuous population. 


(b) The command kruskal.test(values~temp) gives the Kruskal-Wallis statistic as 
14.7167 and the p-value of 0.002075. Thus, the null hypothesis should be 
rejected at level 0.05. 


. Let f41, 2, and 43 be average strength using fixed-platen testers for types 1, 2, and 
3 of corrugated containers. The null and alternative hypotheses are 


Ho: jf = fo = Ys vs H,: Ho is not true. 
From the given information, we calculate 
x : 3 X ae (36 x 754+ 49 x 769 + 42 x 776) = 767.063 
= 36 + 49 + 42 
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and 


3 
SSTr = )_ni(X; — X)? = 36 x (754 — 767.063)? 
7=1 


+ 49 x (769 — 767.063)? + 42 x (776 — 767.063)? = 9681.496. 


Thus, 
SSTr  9681.496 


MSTr = = 
a 3-1 


= 4840.748. 


The mean square error is 


(ny — 1)S7 + (m2 — 1)S3 + (ng — 183 
ny + Ng + N3 — k 
(36 — 1) x 16? + (49 — 1) x 27? + (42-1) x 38? 


= = 831.9032. 
36 + 49 +42—3 


MSE = 


Finally, the F' statistic is 


MSTr _ 4840.748 
Fu, = = 


= = = 5.818884 
° MSE — 831.9032 


and the degrees of freedom are kK —1 = 2 and N—k = 364+ 49+ 42-3 = 
124. Fh-i,n-k,a can be calculated by qf(1-0.05, 2, 124), which is 3.07. Clearly, 
Fy, > Fr-1,n—k,a- The p-value is calculated by 1-pf(5.818884, 2, 124) which returns 
0.003841892. Thus, we should reject the null hypothesis at 0.05 level. 


9. (a) To test the assumptions of equal variances, we use the command 


ff= read.table(“FlexFatig.tat”, header = T); attach(ff); 
anova(aov(resid(aov(values~ind)) **2~ind)) 


and it returns p-value 0.0407. This suggests that the assumption of equal 
variance is suspicious. To test normality, we use the command 


shapiro.test(resid(aov(values~ind))) 


and it returns a p-value of 0.008378. This suggests that the normality assump- 
tion does not hold. 

We use the commands fit = aov(values~ind); boxrplot(resid(fit)) to get the 
boxplot of the residuals and use the command plot(fit, which=2) to get the 
Q-Q plot for the residuals, shown on the next page. 
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These plots also suggest that the normality assumption is not satisfied, in 
agreement with the Shapiro-Wilk test p-value. 


(b) Kruskal-Wallis test can be used for this problem because it is applicable with 
both small and large sample sizes regardless of the normality assumption. 


(c) The command kruskal.test(values~ind) gives the Kruskal-Wallis statistic 11.6167 
and the p-value as 0.02044. Thus, the null hypothesis should be rejected at 
level 0.01. 
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11. 


2. 


Let pi, po, and ps be the probability of ignition for all three materials. The null 
and alternative hypotheses are 


Ao: py =Pp2=p3 vs H,: Ho is not true. 


To test the hypotheses, we use the R commands table=matriz(c(37, 74, 28, 57, 21, 
79), nrow=2); chisq.test(table). The commands return the test statistic as 4.7562 
and a p-value of 0.09273, which is greater than the given significant level 0.05, thus 
the null hypothesis should not be rejected. 


Let pi, po, --:, ps be the proportion of tractors that require warranty repair work 
for the five locations. The null and alternative hypotheses are 


Ay? 0) = Po = P3 = 04 = De vs A, : Ho 1s not true. 


Using the x?-test for this problem, we have 
po) = 18/50 = 0.36, 95 = 8/50 = 0.16, 9, = 21/50 = 042 
and 
04 = 16/50 = 0.32, 73 = 13/50 = 0.26. 
The overall proportion is 
5 


te,  IS+8+21-+ 16+ 13 


y= i = 0.304. 
PS La NP 50x 5 
The test statistic is 
5 - V2 
ni(p; — p) 50 ‘i P 
= = 0.36 — 0.304 0.16 — 0.304 
Quo =) p—p) — 0.304(1 — 0.304) K ae ) 


i=1 
+ (0.42 — 0.304)? + (0.32 — 0.304)? + (0.26 — 0.304)?] = 9.33908. 

The degrees of freedom are k — 1 = 4, thus the p-value can be calculated by the 

command 1 - pchisq(9.33908, 4), which returns 0.05316091. The p-value is greater 

than the given significant level 0.05 and, therefore, we cannot reject Ho. 

The R commands table=matrix(c(18, 32, 8, 42, 21, 29, 16, 34, 13, 37), nrow=2); 

chisq.test(table) return the same value for the test statistic and p-value. 


To make the testing procedure valid, we need the assumption that the samples are 
independent. 


Let p; be the probability that inner glass ply breaks (IPBs) for configuration 2, 
7 =1,2,3,5. The null and alternative hypotheses are 
Ao: pi =P2=p3=ps VS H,: Ho is not true. 


To test the hypotheses, we use the R commands table=matrix(c(91, 14, 128, 20, 
46, 41, 62, 31), nrow=2); chisq.test(table). The commands return the test statistic 
as 44.7619 and a p-value of 1.04 x 107°, which leads to the null hypothesis being 
rejected. 
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10.3 Simultaneous CIs and Multiple Comparisons 


1. (a) Let ws, uc, and fg be the mean total strain amplitude properties of the 
different types of cast iron. The null and alternative hypotheses are 


Ao: ts =Uto=wUG vs A,: Ao is not true. 


(b) The command gives the p-value as 0.3158 and thus, the null hypothesis should 
not be rejected at level 0.05. 


(c) Since Ho is not rejected, there is no need to conduct multiple comparisons to 
determine which pairs of populations differ at experiment-wise level of signifi- 
cance 0.05. 


2. (a) There are m = 3 contrasts and therefore, we should test each of them at level 
a/m = 0.05/3 = 0.0167. 
For Hi : 07 = 03, we calculate the F-statistic as 
SF 16? 
op 2 


0.3512. 


The degrees of freedom are n, — 1 = 35 and nz — 1 = 48. Thus, the p-value 
is calculated by the command 2*min(1-pf(0.3512, 35, 48), 1-pf(1/0.3512, 48, 
35)), which is 0.00167, and we should reject Ho. 


For Hy : 07 = 03, we calculate the F-statistic as 
S216 
S2 382 


0.1773. 


The degrees of freedom are n, — 1 = 35 and n3 — 1 = 41. Thus, the p-value 
is calculated by the command 2*min(1-pf(0.1773, 35, 41), 1-pf(1/0.1778, 41, 
35)), which is 9.78 x 10~’. Thus, we should reject. Hy9. 


For H39 : 03 = 03, we calculate the F-statistic as 
So 2 
52 382 


0.5048. 


The degrees of freedom are ng — 1 = 48 and n3 — 1 = 41. Thus, the p-value 
is calculated by the command 2*min(1-pf(0.5048, 48, 41), 1-pf(1/0.5048, 41, 
48)), which is 0.0234. Thus, we should not reject H30. 

Summarizing the results of the testing, we conclude that the homoscedasticity 
assumption does not hold. 


(b) We need to test three hypotheses, Hip : Wi = be, Hoo : p41 = 3, and Hyp : 
/l2 = fg, and we should test each of them at level a/m = 0.05/3 = 0.0167. 
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For Ajo : 1 = M2, we calculate 


. ee 754 — 769 
lp. = = = —3.199. 
36 49 


The degrees of freedom are 


s?_ 33\? 162, 272\? 
= (i tm) = (5 + %) = [79.84] = 79 
a (St/m)? (3/2)? ~~ | (162/36)? | (272/49)? =i , = , 
ni—1 no—1 36-1 ‘' 49-1 


Thus, the p-value is calculated by 2*pt(-3.199, 79), which returns 0.002. 
Therefore, we should reject Ho. 


For Aygo : Wy, = 3, we calculate 


X, — X3 754 — 776 


Tie = a= = = =§.415. 
162 , 382 
mtn Y3e + 42 
The degrees of freedom are 
Y= | Gime, Gime | ~ | Cemaa 5 Garraae | — [56-86] = 56. 
ny—1 n3—1 36-1 42-1 


Thus, the p-value is calculated by 2*pt(-3.415, 56), which returns 0.0012. 
Therefore, we should reject H0. 


For H39 : 2 = 3, we calculate 


X,-X, 769-776 
Tig = ———= = = —0.9974. 


The degrees of freedom are 


b= (83 /n2)? ($3 /n3)? az (272/49)? ; (382/42)? = [72.56] = 72. 
m2-1 |! ng—1 49—1 42-1 


Thus, the p-value is calculated by 2*pt(-0.9974, 72), which returns 0.3219. 
Therefore, we should not reject A309. 


Summarizing the results of the testing, we conclude that at experiment-wise 
significant level 0.05, 2 and p3 are not significantly different while the pairs 
(141, 2) and (41, 443) are significantly different. 
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(a) Let f11, fg and p3 be the average spark plug resistance at the three different 
blow-off pressures, then the null and alternative hypotheses are 


Ao: 1 = Pe = ps3 vs H,: Ho is not true. 


From the given information, we calculate 


3 
4 : 1 
eo ee 150x5.365-+150x5.4154+150x5.883) = 5.554 
dun TUT ae een cc a 


SSTr = 5 n(Xi — X)? = 150 x (5.365 — 5.554)? 
i=l 


+ 150 x (5.415 — 5.554)? + 150 x (5.883 — 5.554)? = 24.492. 


Thus, 
SSTr 24.492 


MSTr = = 
k-1 3-1 


= 12.246. 


The mean square error is 


(ny = Lee + (ng = Ise + (n3 = Se 
Ny tne + nN3 — k 


_ (150 — 1) x 2.241 + (150 — 1) x 1.438 + (150 — 1) x 1.065 _ +e 
> 150 +- 150 + 150 =—3 ne 


MSE = 


Finally, the F' statistic is 


MSTr = 12.246 
aoe ieel 
and the degrees of freedom are k—1 = 2 and N—k = 1504+150+150—3 = 447. 
Fy-1,.N-k,a Can be calculated by qf(1-0.05, 2, 447), which is 3.016. Clearly, 
Fu, > Fr-1.n-ka- The p-value is calculated by 1-pf(7.746, 2, 447) which 
returns 0.00049. Thus, we should reject the null hypothesis at 0.05 level. 
To make the test procedure valid, we need the assumption that the samples 
are independent and the population variances are equal. 


(b) For three difference pressures, we have m = 3 mean contrasts: jy — M2, i — 
bz, and fg — 3. For each of the three contrasts, we need to test the null 
hypothesis that the contrast is zero vs the alternative that it is not zero at 
level of significance a/m = 0.03/3 = 0.0167. 


For p41 — 2: 


» __ (m1 —1)87 + (mp —1)S2 _ (150 — 1)2.241 + (150 — 11.438 _ 
= = = 1.8395 
P nny 2 150 + 150 — 2 
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and 
X= Xs 5.365 — 5.415 


Ti = 2 
sz (2+) 11.8395 (a5 + th) 


0 


= —0.3192646. 


nd 


The p-value is calculated by 2*pt(-0.3192646, 150+150-2), which is 0.7497496, 
which is greater than 0.0167. Thus Ho should not be rejected and we conclude 
that ~, and pl are not significantly different. 


For py — 3: 
ge (m= VST + (ms = 153 _ (150 = 1)2.241 + (150 - 1)1.065 _ , gn. 
: ae 150 + 150 —2 
and _ = 
a Xy= Xs 5.365 — 5.883 — _3 489185. 


; oi aii [1.653 (5 + 3s) 
Paks (++3) : 150 ' 150 


The p-value is calculated by 2*pt(-3.489185, 150+150-2), which is 0.00056, 
which is less than 0.0167. Thus Hp should be rejected and we conclude that 
fy and pg are significantly different. 


For [2 — fs: 
ge — (2 Y)SG F (ms — 155 _ (150 ~ 11.438 + (150 — 11.065 _ | oe 
p iio 150 + 150 — 2 . 
and _ _ 
Kee kX 5.415 — 5.883 
Tr, = 2 — Xs = —3.622939. 


0 a 
1 1 
s: (£+3) 1.2515 (hy + Ho) 


The p-value is calculated by 2*pt(-3.622939, 150+150-2), which is 0.000342, 
which is less than 0.0167. Thus Hp should be rejected and we conclude that 
fiz and p3 are significantly different. 


4. Let p41, 2, 43, and pg be the average pore size of carbon made at the four different 
temperatures (300, 400, 500, 600). 


(a) To compute the Tukey’s 95% simultaneous Cls, we use the following commands 


pc = read.table(“PorousCarbon.tat”, header= T); attach(pc); 
TukeyHS D(aovu(values~as.factor(temp)), conf.level=0.95). 


The results are summarized in the table on the next page. 
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Contrast | Simultaneous 95% CI | Contains 0? 
[lg — [1 (-0.841, 0.461) Yes 
ial (-1.421, -0.119) No 
[4 — [4 (-1.841, -0.539) No 
[3 — [2 (-1.231, 0.071) Yes 
[la — [2 (-1.651, -0.349) No 
(le gle (-1.071, 0.231) Yes 


We see that the pairs (3, M1), (Ma, #1), and (4, M2) are significant at the 
experiment-wise level of significant 0.05. The plot is given below. 


95% family-wise confidence level 


500-300 
l 


500-400 
| 


600-500 
l 


T T T T 
-1.5 -1.0 -0.5 0.0 0.5 


Differences in mean levels of as.factor(temp) 
(b) To perform Tukey’s multiple comparisons procedure on the ranks, we use the 
following 


r=rank(values); s=as.factor(temp); TukeyHSD(aov(r~s), conf.level=0.95). 


The results are summarized in the table below. 


Contrast | Simultaneous 95% CI | Contains 0? 
flo — fly (-6.703, 4.303) Yes 
[lz — fly (-13.603, -2.597) No 
fla — fly (-17.803, -6.797) No 
0 (-12.403, -1.397) No 
[la — flo (-16.603, -5.597) No 
[la — [3 (-9.703, 1.303) Yes 


We see that the pairs (fis, fix), (Zia, tx), (its, fiz), and (jig, fiz) are significant 
at the experiment-wise level of significant 0.05. The plot is given on the next 
page. 
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95% family-wise confidence level 
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(c) Omitted 


5. (a) Let 1, “2 and pz be the average scores for the three teaching methods, then 


S 


(c 


) 


the null and alternative hypotheses are 
Alo: 1 = fo = ps vs HA,: Ho is not true. 
To perform the ANOVA test, we use the following commands: 


GTM= read.table( “Grades TeachMeth.tat”, header = T); attach(GTM); 
anova(aov(score~as.factor(method))). 


The commands give the p-value as 0.01773, which is less than the given sig- 
nificant level 0.05. Thus, the null hypothesis should be rejected. 


To make the testing procedure valid, we need the assumptions that the samples 
are independent and the populations are normal with equal variances. 


To construct Tukey’s 95% simultaneous CIs for all pairwise contrasts, we 
use the command TukeyHSD(aovu(score~as.factor(method)), conf.level=0.95). 
The simultaneous Cls for fuz — 4, (43 — 1, and juz — fig are (-3.72, 14.97), (2.28, 
20.97), and (-3.35, 15.35), respectively. The result shows that the teaching 
methods 1 and 3 are significantly different. 


To test the assumptions of equal variances, we use the command 
anova(aov(resid(aov(score~as.factor(method)))**2~as.factor(method))), 
and it returns p-value 0.8972. This suggests that the assumption of equal 


variance is plausible. To test normality, we use the command 
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shapiro.test(resid(aou(score~as.factor(method)))), 


and it returns a p-value of 0.8471. This suggests that the normality assumption 
is also plausible. 


Thus, the procedures in parts (a) and (b) are valid. 


6. (a) To conduct the Kruskal-Wallis test at level 0.05, we use the command 
kruskal.test(scores~as.factor(method)). 


The p-value is given as 0.02112. Thus, we should reject the null hypothesis 
that the distributions are the same. To make the test valid, we need to assume 
that the populations are continuous. 


(b) The three teaching methods yield m = 3 possible pairwise median contrasts: 
fl2 — fl1, 3 — fi, and fig — fg. The hypothesis that each of the contrasts 
is zero vs the two-sided alternative will be tested at an individual level of 
a/3 = 0.0167. We use the following commands: f1 = score/method==1/; f2 = 
score/method==2/ ; {3 = score[method==3]; wilcoz.test(f2, f1); wilcox.test(f3, 
f1); wilcoz.test(f3, f2). The results are summarized in the table below. 


Contrast | p-value for Ho : fii — fj = 0 | Less than 0.0167? 


p= 0.1049 No 
ai 0.01041 Yes 
jis = fis 0.1412 No 


Thus, methods 3 and 1 are significantly different at experiment-wise level of 
significance 0.05. 


(c) To perform Tukey’s multiple comparisons procedure on the ranks, we use the 
commands 


r=rank(score); s=as.factor(method); TukeyHSD(aovu(r~s), conf.level=0.95). 


The simultaneous Cls for fig — fi1, fig — fi1, and [fiz — fig are given as (-2.414 
12.789), (2.211, 17.414), and (-2.976, 12.226), respectively. Thus, only methods 
3 and 1 are significantly different at experiment-wise level of significance 0.05. 
The plot is given on the next page. 
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7. We use the following commands: 


k=3; alpha=0.05/(k*(k-1)/2); o=c(87, 28, 21):n=c(111, 85, 100); 
for(i in 1:(k-1)}{ for(j in (i+1):k){ 


print(prop.test(c(o/i/, ofj/), c(n[t], n[j]), conf.level=1-alpha, 
correct=F) $conf.int)}} 


The Cls for p; — po, pi — p3, and p2 — ps3 are given as (-0.158, 0.166), (-0.022, 0.268), 
and (-0.037, 0.276), respectively. The Cls all contain 0 and, thus, all contrasts are 
not significantly different from zero. 


8. We use the following commands: 


k=4; alpha=0.05/(k*(k-1)/2); o=c(91, 128, 46, 62); n=c(105, 148, 87, 93); 
for(i in 1:(k-1)}{ for(j in (i+1):k){ 


print(prop.test(c(o/i/, ofj/), e(n[t], n[j]), conf.level=1-alpha, 
correct=F)$conf.int)}} 


The Cls for pi —p2, pi — p3, Pi — Ps, P2 — P3, P2 — Ps, and p3 — ps are given as (-0.113, 
0.117), (0.172, 0.504), (0.044, 0.356), (0.177, 0.496), (0.049, 0.347), and (-0.329, 
0.053). The results show that, p; and po, ps and ps, are not significantly different at 
experiment-wise level 0.05. All other contrasts are significantly different from zero. 
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10.4 Randomized Block Designs 


1. (a) The ANOVA F-test procedure in Section 10.2 is not recommended because of 
the presence of a block effect. 
To incorporate additional information, we use the model 


Ag = pg + + Sy, (=1,2.8 and. 9 =1,-7,9 
with >>, a; = 0 and Var(b;) = of. 
(b) To decide if variation in the mole fraction of water affects the mean diffusivity, 
we have the null and alternative hypotheses as 


Hp: a, =Q2=a3 vs H,: Hp is not true. 


(c) To test the hypotheses, we use the following commands 


MF = st$ind; temp = as.factor(rep(1:length(CRSMF1),3)); 
summary (aov(st$values~ MF+temp)). 


The obtained ANOVA table is: 
DF SS MS F P 
MF 2 0.019 0.0095 128 jee agit mes 
temp 8 24.838 3.1048 41654 <2~x 10716 
Residuals 16 0.001 0.0001 


Since the p-value is 1.44 x 10~1°, we should reject Hp. 

(d) In Exercise 5 of Section 10.2, we only considered treatment effect. Using the 
command summary(aov(st$values~ MF)), we have the p-value of 0.991, which 
is the same as in Exercise 5 of Section 10.2. This analysis is not appropriate 
because the samples are not independent. 


(e) To construct Tukey’s 99% simultaneous Cls, we use the command 
TukeyHSD(aov(st$values~» MF +temp), “MF”, conf.level=0.99), and the re- 
sults are below. 


Comparison | 99% Tukey’s SCI | Contains 0? 


fla — [4 (-0.0027, 0.0249) Yes 
[3 — fly (0.0474, 0.0749) No 
liz — 2 | (0.0362, 0.0638) No 


We can conclude that ju; and ply are not significantly different at 99% level, 
but the other contrasts are significant. 


2. (a) An appropriate model for the observation is 
Age = fh Oy pe, v= Li es and 9 =152:3 
with }>,a; = 0 and Var(b;) = o7. The random blocks correspond to the 


technicians. 
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(b) To test the null hypothesis that the average service time of the three drives is 
the same, that is, 


Ho: 0, =a2=a3 vs H,: Ho is not true, 


we use the commands 


time=C( 44.8, 98.4,-40-2,.47.8,. 01.2, 60.8, 73.4, 71.2, 61.6)% 
Drive = os factor(e/1,1, 1,2, 2, 2, 3, a, 3) Je Tech = ae facior(rép(1:3,2) ); 
anova(aou(time~ Drive+ Tech)). 


The ANOVA table is given below. 


DF ope MS F Pp 
Drive 2 1229.66 614.83 10.1183 0.02724 
Tech 2 4.92 2.46 0.0404 0.96075 
Residuals 4 243.06 60.76 


The p-value is 0.02724. 


(c) We use the command friedman.test(time, Drive, Tech) for the Friedman’s test 
and we get 6 as the test statistic and 0.04979 as the p-value. 


3. (a) The appropriate model is 
Age = fl Og oy ey, 1=1,2,3,4 and g=1,°+»,8 


with >>,a; = 0 and Var(b;) = o7. The parameters a; specify the treatment 
(design) effects, and the parameters b; represent the random block (pilot) 
effects. 


(b) For deciding if the designs differ in terms of the pilot’s mean response time, 
we have the null and alternative hypotheses as 


Hy: a, = a2 =a3=a4 vs AH, : Ho is not true. 


(c) Load the data by PRT = read.table(“PilotReacTimes.tat”, header = T); at- 
tach(PRT). Fit the model by fit = aou(times~design+pilot). For checking the 
assumptions of homoscedasticity of the intrinsic error variables of the model 
in (a), we use anova(aov(resid(fit)**2~design+pilot)). The returned p-values 
are 0.231 and 0.098 for the design and pilot effects on the residual variance, 
suggesting there is no strong evidence against the homoscedasticity assump- 
tion. 

To check the normal assumption, we use the command shapiro.test(resid(fit)), 
which returns a p-value of 0.7985, suggesting the normality assumption is 
reasonable for this data. 
We use the commands bozplot(resid(fit)) to get the boxplot of the residuals, 
and use the command plot/(fit, which=2) to get the Q-Q plot for the residuals, 
shown on the next page. 
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These plots also suggest that the normality assumption is satisfied, in agree- 
ment with the p-value. 


(d) To test the hypotheses in part (b), we use the command anova/(fit), which 
returns p-values of 0.00044 and 1.49 x 10~° for the design and pilot effects on 
the response time. Thus, the null hypothesis in part (b) is rejected. 


4. To construct Tukey’s 99% simultaneous Cls, we use the command TukeyHSD (fit, 
“design”, conf.level=0.99). The results are listed on the next page. 
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Comparison | 99% Tukey’s SCI | Contains 0? 
B2a (-1.81, 0.13) Yes 
C—A (-0.98, 0.96) Yes 
Dis (-2.13, -0.19) No 
C-B (-0.14, 1.80,) Yes 
D8 (-1.30, 0.65) Yes 
Deg (-2.13, -0.18) No 
We can conclude that the pairs of designs (D, A) and (D, C) are significantly differ- 
ent at 99% level, but the other pairs of designs are not significantly different. The 
plot is produced by the command plot(TukeyHSD/(fit, “design”, conf.level=0.99)), 
and is shown below. 
99% family-wise confidence level 
‘ 7 4 
$7 | | | 
a | | | 
a ae 
Differences in mean levels of design 
5. (a) “Fabric” is the blocking factor. 


(b) The completed ANOVA table is given below. 


DF Sum Sq Mean Sq_ F-Value P 
treatment 3 2.4815 0.8271667 19.425 6.72~x10~° 
block 4 5.4530 1.36325 32,014 257% 10-* 
Residuals 12 0.5110 0.042583 


Based on the p-value, at level a = 0.05, we should reject the null hypothesis 
that the four chemicals do not differ in terms of the mean strength of the 
fabric. 
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6. (a) The command gives 0.0001955 as the p-value for testing the null hypothesis 
that the four chemicals do not differ in terms of the mean strength of the 
fabric. The hypothesis be rejected at level a = 0.01. 


(b) To conduct Tukey’s multiple comparisons on the ranks, we use the command 


TukeyHSD(aov(ranks~fs$chemical+fs$fabric),”fs§chemical”, conf.level=0.99). 
The results are listed below. 


Comparison | 99% Tukey’s SCI | Contains 0? 
BvsA (1.22, 11.98) Yes 
Cvs A (2.88, 7.88) Yes 
DvsA (3.12, 13.88) No 
Cvs B (-9.48, 1.28) Yes 
D vs B (-3.48, 7.28) Yes 
D vs C (0.62, 11.38) No 


Based on the simultaneous Cls, we find that at experiment-wise level of sig- 
nificance 0.01, the pairs (D, A) and (D, C) are different. 


7. (a) We use the commands 


Ag =0( 19.0, 21.8, 16.8, 24.2, 22.0, 34.7, 23.8); 
Bt= c(17.8, 20.2, 16.2, 41.4, 21.4, 28.4, 22.7); 
Ci=c(7 08, 22.5, 77.6, 38.1, 25.8, 39.4, 23.9): 
t.test(At, Bt, paired=T, conf.level=1-0.05/3) 
t.test(At, Ct, paired=T, conf.level=1-0.05/3) 
t.test(Bt, Ct, paired=T, conf.level=1-0.05/3). 
We get the Bonferroni’s 95% SCIs for 1a — we, fa — fc, and wg — pc are 
(-10.136, 8.479), (-9.702, 2.188), and (-8.301, 2.444), respectively. Since all of 


them include 0, none of the differences are significantly different at experiment- 
wise significance level 0.05. 


(b) To perform Bonferroni’s multiple comparisons by the signed-rank test at 
experiment-wise error rate of 0.05, we use the commands 


wilcox.test(At- Bt, conf.int = T, conf.level=1-0.05/3) 
wilcoz.test(At- Ct, conf.int = T, conf.level=1-0.05/3) 
wilcox.test(Bt- Ct, conf.int = T, conf.level=1-0.05/3). 
We get the Bonferroni’s 95% SCIs for ji4—fip, fta—fic, and fip—fic are (-8.30, 
3.75), (-13.9, -0.1), and (-11.0, 3.3), respectively. The difference fi, — fic is 


significantly different from zero at experiment-wise error rate of 0.05, but the 
other differences are not. 
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Chapter 11 


Multifactor Experiments 


11.2 Two-Factor Designs 


1. (a) FG? = 0.6339 with p-value of 0.4286; the hypothesis of no interaction effect is 
not rejected at level of significance 0.05. 


FG, = 141.78 with p-value of less than 2.2 x 107!°; the hypothesis of no main 
growth hormone effects is rejected at level of significance 0.05. 


Fi, = 18.96 with p-value of 4.46 x 107°; the hypothesis of no main sex steroid 
effects is rejected at level of significance 0.05. 


(b) The interaction plot is given in the following plot. It is observed that the two 
lines are roughly parallel, which indicates there is no interaction effect, and 
this is consistent with the formal F-test. 


Cell Means of Y 


v 
4 


(c) The residual plot is given in the figure below. This figure shows that the 
homoscedasticity assumption approximately holds. 
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Residuals vs Fitted 
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The Q-Q plot for the residuals is shown below. The figure shows that the 
normality assumption holds approximately. 
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(d) The p-values for testing the hypotheses of no main growth effects, no main 
sex steroid effects, and no interaction effects on the residual variance are, re- 
spectively, 0.346, 0.427, and 0.299; none of these hypotheses are rejected. The 
p-value for the normality test is 0.199, so the normality assumption appears 
to be reasonable. 


2. (a) FRC = 1.8192 with p-value of 0.0910; the hypothesis of no interaction effect is 
not rejected at level of significance 0.01. 
Fi, = 146.17 with p-value of less than 2.2 x 10~'®; the hypothesis of no main 
signal level effects is rejected at level of significance 0.01. 
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FS, = 23.13 with p-value of 1.345 x 10~!'; the hypothesis of no main cell phone 
type effects is rejected at level of significance 0.01. 


(b) From the result of multiple comparisons, it is seen that the following pairs of 
cell phone types are significantly different at level 0.01 in terms of their mean 
effects: B2-B1, B3-B1, B4-B1, B3-B2, B5-B2, and B5-B4. The pairs of signal 
levels are significantly different at level 0.01 in terms of their mean effects are 
L-H, M-H, and M-L. 


(c) The p-values for testing the hypotheses of no main signal level effects, no main 
cell phone type effects, and no interaction effects on the residual variance are, 
respectively, 0.485, 0.954, and 0.801; none of these hypotheses are rejected. 
The p-value for the normality test is 0.092, so the normality assumption is not 
strongly contradicted by the data. 


(d) The interaction plot is shown below. It shows that there might be slight 
interaction effects. This is in agreement with the p-value of 0.091 for the no 
interaction test. 


Cell Means of y 
1.0 


The residual plot is given below. The figure suggests that the homoscedasticity 
assumption holds approximately. 
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Residuals vs Fitted 
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The Q-Q plot for the residuals is shown below, and the figure shows that the 
normality assumption holds approximately. 


Normal Q-Q 


Standardized residuals 


Theoretical Quantiles 
aov(y ~S *C) 


3. (a) The hypothesis of no interaction between temperature and humidity. 
(b) The hypothesis of no main humidity effect. 
4. (a) This is to test whether or not there is main row effect: Hj! : a, = ay = a3 = 
a= 0. 
(b) This is to test whether or not there is main column effect: Hg! : 6; = 6. = 0. 


(c) The F-values for testing Hj‘, H?, and H;'” are respectively 2.1095, 6.2030, and 
3.9496, the corresponding p-values are 0.1392, 0.0241, and 0.0277. Therefore, 
at level 0.05, we would reject H? and H;!”. 
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(d) From part (c), we already concluded that there is no main factor A (humidity 
level) effects; therefore, it is not necessary to conduct multiple comparisons. 


5. (a) The residual plot is given below. The figure shows that the homoscedasticity 
assumption approximately holds. 
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The Q-Q plot for the residuals is shown below, and the figure shows that the 
normality assumption holds approximately. 


Normal Q-Q 
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Theoretical Quantiles 
aov(Inquiries ~ Day * Section) 


(b) The p-values for testing the hypotheses of no main day effects, no main section 


effects, and no interaction effects on the residual variance are, respectively, 
0.3634, 0.8096, and 0.6280; none of these hypotheses are rejected. The p-value 
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for the normality test is 0.3147, so the normality assumption appears to be 
reasonable. 

(c) The pairs of days, except for (M, T), (M, W) and (T,W), are significantly 
different at experiment-wise error rate a = 0.01. The pairs of newspaper 
sections (Sports, Business) and (Sports, News) are also significantly different. 


6. (a) The completed ANOVA table is shown below 


DF SS MS F P 
Species 2 146010 73005 5.954 0.016 
Size 1 3308 3308 0.270 0.613 


Interaction 2 41708 20854 1.701 0.224 
Error 12 147138 12261.5 
Total 17 338164 


(b) The statistical model for Yi; is Yiz~ = w+ ai+ Bj + Viz + €ijx. The assumption 
is that the error variables €;;, are independent normal with zero mean and 
common variance. 


(c) To test whether an additive model is appropriate, the null hypothesis is Hé'? : 
Vi = +++ = 732 = 0, and the alternative hypothesis is Hj? : at least one of 7; 
is not 0. By the p-value from the table, the null hypothesis is not rejected. 


(d) By the p-value from the table, Hj‘ is rejected at level 0.05, but H? is not 
rejected. 


7. The completed ANOVA table is shown below 


DF SS MS F P 
Species 2 146010 73005 5.412 0.018 
Size 1 3308 3308 0.245 0.628 
Error 14 188846 13489 
Total 17 338164 


By the p-value from the table, Hj! is rejected at level 0.05, but H@ is not rejected. 
8. The ANOVA table is calculated as 


DF SS MS F P 
A 2 48.6534 24.3267 10.0801 3.34 x 10~+ 
B 2 22.0901 11.0451 4.5767 0.0169 


A:B 4 5.9626 1.4906 0.6177 0.6528 
Residuals 36 86.8800 2.4133 
Total 44 163.5861 


From the result, we can see that Hj! is rejected at level 0.01, but H@ and Hé)” are 
not rejected. 
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9. (a) The ANOVA table from the R commands is given as 


DF SS MS F ig 
A 4 52066 13016 3.4022 0.0661095 
B 2 173333 86667 22.6528 0.0005073 


Residuals 8 30607 3826 
From the table, the hypothesis of no main row effect is not rejected, while the 
hypothesis of no main column effects is rejected. 


(b) The interaction plot is shown below, which clearly indicates that there is 
interactive effect. 
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The ANOVA table from Tukey’s one degree of freedom test is given as 


DF SS MS F P 
A 4 52066 13016 14.491 0.001693 
B 2 173333 86667 96.482 8.026 x 10~° 


fitteds 1 24319 24319 27.073 0.001249 
Residuals 7 6288 898 


The result suggests the factors interact. 


(c) Yes 
10. (a) The ANOVA table from the R commands is given as 
DF SS MS F i 
S 2 0.77297 0.38649 63.8469 1.208 x 10~° 
C 4 0.13129 0.03282 5.4224 0.02068 
Residuals 8 0.04843 0.00605 
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From the table, the hypothesis of no main row effect is rejected, but the 
hypothesis of no main column effects is not rejected. 


(b) All pairs of the signal factor levels are significantly different at level 0.01, and 
no pair of the cell phone type factor levels is significantly different at level 
0.01. 


11. (a) The ANOVA table from the R commands is given as 


DF SS MS F ig 
A 4 14815305 3703826 31.752 2.347 x 107" 
B 9 19608061 2178673 18.677 4.612 x 107!" 


Residuals 36 4199367 116649 


From the table, the hypotheses of no main row and no main column effects 
are both rejected. 


(b) All pairs of the Auxin factor levels are significantly different at level 0.01 except 
(0.5,0.1), (2.5,0.1) and (2.5,0.5 ). 


(c) The ANOVA table from Tukey’s one degree of freedom test is given as 


DF SS MS F Lig 
A 4 14815305 3703826 59.829 3.657 x 10~!° 
B 9 19608061 2178673 35.193 6.015 x 107! 


fitteds 1. 2032633 2032633: 32.8384 1.752% 10° 
Residuals 35 2166733 61907 


The result suggests the factors interact. 


12. (a) The calculated ANOVA table is 
DF 5S MS F P 
A 1 0.125 0.125 0.051 0.8361 
B 3 129.3875 43.125 17.542 0.0209 
Error 3 7.375 2.4583 


(b) At level a = 0.05, we would reject H@ but retain H;}. 


11.3 Three-Factor Designs 


1. (a) The full model is X44 = wtait+Byt+ye+(@B)ig+(@y int (OY) je t(ABY) pe teaznt- 
The ANOVA table given by the R commands is 
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DF SS MS F i 

MS 1 0.062662 0.062662 55.6612 1.452 x 10~° 
SH 2 0.005973 0.002987 2.6530 0.0807548 
MH 1 0.015010 0.015010 13.3331 —0.0006433 
MS:SH 2 0.024394 0.012197 10.8342  0.0001309 
MS:MH 1 0.010693 0.010693 9.4987 0.0034016 

SH:MH 2 0.043146 0.021573 19.1629 7.628 x 1077 
MS:SH:MH 2 0.000008 0.000004 0.0037 0.9962765 


Residuals 48 0.054037 0.001126 


It is seen that all the main effect and interaction effects are significant at level 
0.05, except for the main effect of SH and the three-factor interaction. 


(b) The model without the three factor interaction is Xj;j,; = “+a; + 6; + Yn 
(aB)iz + (aY)iz + (BY) jh + €izat- The ANOVA table given by the R commands 


1S 


DF SS MS F es 
MS 1 0.062662 0.062662 57.9714 6.606 x 10~1° 
SH 2 0.005973 0.002987 2.7631 0.0727448 
MH 1 0.015010 0.015010 13.8864 0.0004949 
MS:SH 2 0.024394 0.012197 11.2839 9.029 x 10-° 
MS:MH 1 0.010693 0.010693 9.8929 0.0027925 
SH:MH 2 0.043146 0.021573 19.9582 4.249 x 10-" 


Residuals 50 0.054046 0.001081 


It is seen that all the main effect and interaction effects are significant at level 
0.05, except for the main effect of SH. 


(c) When testing the homoscedasticity, the p-values for the main factors, MS, 
SH, and MH are respectively, 0.010142, 0.108286, and 0.004175. The results 
suggest the homoscedasticity assumption does not hold. The p-value of the 
Shapiro-Wilk test for normality is 0.07, but in the presence of heteroscedas- 
ticity this is not easily interpretable 


(d) After the square root arcsine transformation on the response variable, only the 
main factor MH effect on the residual variance has a p-value less than 0.05 
(0.016). The p-value of the Shapiro-Wilk test is 0.64, suggesting the normality 
assumption is tenable. 


(a) The full model is Xjj43 = u+ai+Bj+ye+(@B)ig+(@y)int(BY) je t+(ABY) pe teagne- 
The ANOVA table given by the R commands is 
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DF 
IM 
OT 
TS 
IM:OT 
IM:TS 
OT:TS 
IM:OT:TS 
Residuals 24 


NNOrRMNMFR FR WY 


SS 
20.909 
22.216 

8.085 

4.938 

3.5074 
62.938 
1.784 
59.905 


MS 
10.454 
22.216 

8.085 

2.469 

Liver 
62.938 
0.892 

2.329 


F 
4.4881 
9.5370 
3.4707 
1.0599 
0.7672 

27.0189 
0.3830 


lig 
0.022087 
0.005028 
0.074750 
0.362173 
0.475346 
2.519 x 10~-° 
0.685938 


It is seen that the main effect of IM and OT and interaction effects of OT and 
TS are significant at level 0.05. 


(b) When testing the homoscedasticity, only the p-value for the three-factor in- 
teraction effect is less than 0.05 (0.01212); thus, we would conclude that the 
homoscedasticity assumption holds. The p-value of the Shapiro-Wilk test is 


0.80, suggesting the normality assumption is tenable. 


(c) The three interaction plots are given in the following graphs, and they sug- 
gest that there is interaction effects between “insulation type” and “outside 


temperature”. 
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(d) The model without the (a), (ay), and (a67) interactions is Xjj—, = w+ a; + 
By +e + (BY) 56 + €ijxt- With this model, the resulting ANOVA table is 
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DF SS MS F P 
IM 2 20.909 10.454 4.7375 0.016292 
OT L 22.216 22.216 10.0672 0.003473 
TS 1 8.085 8.085 3.6636 0.065195 
OT; TS 1 62.938 62.938 28.5210 8.906 x 10~° 
Residuals 30 66.202 2.207 
We can see the the main effects of IM, OT are significant at level 0.05 and the 
interaction effect of OT and TS is significant at level 0.05. 
3. (a) 0221 = ab = 16, £442 = c= 12, and 0122 = bc = 18 


(b) Same as in Table 11.2 


(c) ay = —1.375, By = —1.625, 71 = —2.3750, (a8) = 0.875, (By) = —0.375 
(ay)11 = 0.875, and (a8y)111 = 1.375 


(d) SSA = 16 1.375" = 390.25, SSB = 161.625" = 42.25, S5C = 16 x 
2370 = 9025, SAR = 16% 0.875 = 12.25. SSAC = 16 0.373 =225, 
SSec = 16 0875 =1225..6nd SSARC = 16 ¥ 1.375 = 30.25 


FA = 30.25/((12.25 + 2.25 + 12.25 + 30.25)/4) = 2.12, FB = 42.25/((12.25 + 
2.254+12.25+30.25)/4) = 2.96, FS, = 90.25/((12.25+2.25+12.25+30.25) /4) = 
6.33; these test statistics are all less than F),4.0.05, so none of the main effects 
are significantly different from zero. 

(f) The probability plot of the effects is given below. We notice that the two 
outliers on the top right corner correspond to the three-factor effect and one 
of the two-factor effects, which indicate that the assumption of no interaction 
is not appropriate. 


(e 


—- 


Normal Q-Q Plot 


Sample Quantiles 


T T T T T 
-1.0 -0.5 0.0 0.5 1.0 


Theoretical Quantiles 
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4. (a) Same as in Table 11.2. 

(b) 3 ee —29.375, fen = 7.75, Vy = —13.25, (ab) = 9.125, (By) = —8.875 
(ay)11 = —6.75, and (a@8y)111 = —5.875 

(c) The calculated sum of squares are SSA = 13806, SSB = 961, SSC’ = 2809, 
SSAB = 1332.3, SSAC = 1.260.3, SSBC = 729, and SSABC = 552.25. 

(d) The calculated F values are Fy, = 6.5866, Ff, = 0.4585, FG, = 1.3401, 
Fa? = 0.6356, FAC = 0.6012, FRC = 0.3478, and FAP° = 0.2635. Since 
F8,0.05 = 5.32, we conclude that there is main effect of Temperature. 

5. (a) True 

(b) True 

(c) Fi, = 47.956/((0.276+9.456+1.156+6.126)/4) = 11.27, Fi? = 4.306/((0.276+ 
9.456+1.156+6.126)/4) = 1.01, FG, 12.426/((0.276+9.456+1.156+6.126)/4) = 
2.92. The corresponding p-values are 0.03, 0.37, and 0.16. Thus, the hypothesis 


of no main factor A effects is rejected and the main effects of factors B and C 
are not significantly different from zero 


6. Since the factor C is fixed at level k, the term 7, in (11.3.1) is a constant. We collect 
the constant terms and denote the new constant as u* = +, which is (a). Note 
that this constant depends on the level k. Similarly, we collect all the terms that 


depend on the main effect of factor A only and denote them as a’, resulting (b) 


ak = a+ (ay)izx. The other relations could be verified in a similar manner. 


11.4 2” Factorial Experiments 
1. Let 0; be the effect of block 1 and 62 = —6, be the effect of block 2; then the mean 
Ligk of Nigh is 
Page = pt og + By +e + (Bag + (eV a t+ (PY) se + (COBY) age + 01, 


if (i, j, k) is one of (1, 1, 1), (2, 2, 1), (2, 1, 2), (1, 2, 2), and the same expression 
with 0, replaced by —6, if (i, 7,&) is one of the other four sets of indices. 


Then d, is the contrast 
XX — Xo + X121 — X01 + X12 — Xa12 + X122 — X222 
3 . 


In the term X11; — X92; and —Xo12 + X22, 9 is cancelled. Similarly, in the other 
four terms, 02 is cancelled. 


In the same manner, we can verify that the block effect is not confounded with the 
other main effects and two-factor interaction effects. However, for the three factor 
effect, (@B7)111 is the contrast 
Aji = Aan. — Xi + Ager — Xia + Asia + A toy — X29 
8 ? 
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where the four positive terms X41, X221, X212, and X122 contribute 46,, and the 
four negative terms —X211, —X121, —X112, and —X2 contribute —462 = 40,. Thus 
the block effect cannot be cancelled. Hence, the block effect is confounded with the 
three factor interaction effect. 


In the following, we assume the block effect for block 1 is 6; and for block 2 is 
Ay = —6,. 


(a) With the given command, we will assign (1,b) to block 1 and (a, ab) to block 
2. The estimated main effect of factor A is given by 
Vit = Yor tie = Ye 
Z 


The positive terms Y,; and Yj, are in block 1 and contribute 26); the negative 
terms —Y5, and —Y9 are in block 2 and contribute —20, = 26,. Thus, the 
block effect cannot be cancelled and it will be confounded with the main effect 
of factor A. 


(b) With the given command, we will assign (1, a, bc, abc) to block 1 and (6, ab, c, ac) 
to block 2. The estimated BC interaction effect is 


Yara + Your — Yie1 — Yoo1 — Yira — Yoi2 + Yi2e + Yo22 
5 ; 


The positive terms Y111, Yo11, Yi22, and Yo29 are all in block 1 and they con- 
tribute 46,; the negative terms —Yj21, —Yo221, —Yi12, and —Y12 are in block 2 
and they contribute —46, = 46,. Thus, the block effect cannot be cancelled 
and it will be confounded with the interaction effect BC. 
(a) ABCCDE = ABDE 
(b) BODOCDE = BE 
(c) The R commands for part (a) 
G=rbind(c(1, 1, 1, 0, 0), c(0, 0, 1, 1, 1)); conf.design(G, p=2) 
The R commands for part (b) 
G=rbind(c(0,1, 1, 1, 0), c(0, 0, 1, 1, 1)); conf.design(G, p=2) 
(a) There are 2? = 8 blocks; therefore, there must be 2? — 1 = 7 effects to be 
confounded with the block effects. 


(b) The other four effects to be confounded with the block effects are ABC BCD = 
AD, ABCCDE = ABDE, BCDCDE = BE, and ABCBCDCDE = AE. 


(c) The R commands are 


G=rbind(c(1, 1, 1, 0,0), c(0, 1, 1, 1, 0), c(0,0,1,1,1)); conf.design(G, p=2) 
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5. (a) The R commands are 


sr = read.table(”SurfRoughOPtim.tat”, header = T); attach(sr) 
= e(rep(1, 16)): b/e(2,3)0,8,10, 11,13,16)/ = 2° ardblock = 4 
anova(aov(y~ block+A *B*C, data=sr)) 


The obtained ANOVA table is 


DF SS MS F 

block 1 0.36 0.36 0.3251 

A 1 495.06 495.06 447.0090 

B 1 486.20 486.20 439.0090 

C 1 90.25 90.25 81.4898 
A:B 1 13.69 13.69 12.3612 
A:C 1 0.56 0.56 0.5079 
B:C 1 1.10 1.10 0.9955 

Residuals 8 8.86 1.11 


P 
0.584237 
2632210" 
2225510" 
1.812 x 10-° 
0.007894 
0.496307 
0.347623 


(b) From the ANOVA table, it is clear that the three main effects and the AB 
interaction effect are significantly different from zero. 


6. First use the command G = rbind(c(1,1,0), c(1,0,1)); conf.design(G,p=2) to get 
the block allocation, then apply the commands 


sr = read.table(”SurfRoughOPtim.tat”, header = T); attach(sr) 


b = c(rep(4, 16)); b[c(1,8,9,16)] = 1; b[ce(2,7,10,15)] = 2; b[c(3,6,11,14)] = 3; 
sr$block = b 


anova(aov(y~block+A*B*C, data=sr)) 


The obtained ANOVA table is 


DF 
block 


A:C 
A:B:C 


1 
1 
1 
1 
A:B 1 
1 
1 
Residuals 8 


SS 
1.62 
495.06 
486.20 
90.25 
13.69 
0.04 
0.36 
8.86 


MS 
1.62 
495.06 
486.20 
90.25 
13.69 
0.04 
0.36 
ile 


F 
1.4668 


i 
0.260409 


447.0090 2.632 x 1078 
439.0090 2.825 x 1078 
81.4898 1.812 x 10-5 


12.3612 
0.0366 
0.3251 


0.007894 
0.853110 
0.584237 


From the ANOVA table, it is clear that the three main effects and the AB interac- 
tion effect are significantly different from zero. 


7. (a) The R commands are 
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G=rbind(c(1, 1, 0,0), c(0,0,1,1)); conf.design(G, p=2) 
(b) The R commands are 
AW = read.table( “Arc Weld.tat”, header = T); attach(AW); 
w = c(rep(4,16)); wie(1,4,13,16)] = 1; w[c(2,3,14,15)] = 2; 


w[c(5,8,9,12)] = 3; AW$block = w 


(c) Using the command anova(aou(y~ block+A *B*C*D, data=AW)) gives the ANOVA 


table 
DF SS MS F P 
block 1 102 102 0.3103 0.5852 
A 1 10513 10513 31.8561 3.660 x 107° 
B 1 80802 80802 244.8545 4.046 x 10-" 
C il 50 50 0.1515 0.7022 
D i 91 91 0.2761 0.6065 
A:B 1 664 664 2.0128 0.1752 
A:C il 25 25 0.0742 0.7887 
B:C 1 61 61 0.1833 0.6742 
A:D if 153 153 0.4640 0.5055 
B:D 1 231 231 0.7004 0.4150 
A:B:C 1 12 12 0.0379 0.8481 
A:B:D 1 6 6 0.0186 0.8933 
A:C:D 1 78 78 0.2367 0.6332 
B:C:D 1 45 45 0.1367 0.7164 
A:B:C:D 1 36 36 0.1095 0.7450 


Residuals 16 5280 330 


From the ANOVA table, it is clear that the main effects of the factors A and 
B are significantly different from zero. 


8. First, we consider the contrast 


T111 — Lo21 + Lo12 — L129 


4 
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It estimates 


22 Ie = : lar + Bi +71 4+ (B11 + (aya + (BY) + (@BY) 111] 


4 4 
— Fler + Bo + 11+ (28) + (oar + (Br)21 + (08) 
+ Flo + Br +12-+ (08) + (ar)22 + (By)i2 + (a)an 
— Fos + Be+70 + (aB)in + (or)r2 + (Br)22 + (a8)r2 
= Fla + +1 + (a8) + (on) + (Ba)na + (8) 
= Flor = Br +1 + (o8)u1 — (ay)us — (Br) + (AB Yaa 
+ Gl-a1 + B11 — (08) + (oyun — (By) + (Ban 


= slo —Pi-1 — (aB)u — (ay) + (By) + (eBY)111] 
= By + (ay)11. 


This shows that 6; is confounded with (ay),1. 


Next, consider the contrast 


T1441 + Lo91 — Lo1Q — L122 


4 


It estimates 


5 Viepe = Voie 1 
eee 7 aoe zion + By ++ (a@B)au + (ay) + (By) + (aBY)i11] 


- acy + Ba++ (aB)22 + (aya + (BY )a1 + (@BY) 221] 


[a2 + Bi + 72 + (aB)a1 + (aY)22 + (BY)12 + (aBY)212] 


— qlon + Bo + 2+ (@B)12 + (@y)12 + (BY)22 + (aBY)122] 


= 1a + Br ya (OB a (ey )12 + (PY) + (eB) 11] 


[=a = Py yer (aB)i1 -_ (ay)u = (Cy)u i (aBby)111] 


[ey Gi =i (aera — (0a (eer | 


a qlon —B-m — (aB)u — (ay)u + (By) + (eB) 111] 
= 71 + (a8)11. 


This shows that 7 is confounded with (a@)11. 
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9. Consider the contrast 


—%a11 + T1201 + Ti12 — F222 
4 2 


which estimates 


Hor + pia + Bi — 1 
Te — qlee t By +41 + (aB)ar + (ay)ar + (By)11 + (a8 Y)211] 


4 

+ Fler + Be +1 + (08)i2 + (ar)ar + (Bran + (By) 

+ Flor + Ai +02 + (08) + (ay 2 + (Br )2 + (a8) 
jla2 + B22 + (aB)o2 + (arden + (Br)a2 + (aB rx 

= Fan + 1 +1 — (08)n ~ (en + (BY — (oan 

+ Glen — 81 +71 — (Bun + (or)in — (Br)u1 — (8) 

+ Flor +b 1 + (a8) — (en — (69) — (Ban 

— Flor — B11 + (oB)a1 + (or) + (Bo) — (Ban 

= a1 — (BY)11. 


This shows that a; is confounded with (3y7)11. 


Consider the contrast 


L211 — L421 + ®112 — Lo29 
4 + | 


which estimates 


Mai — P21 + i112 — [222 


ri = slo. + Br + 71+ (@B)a1 + (ay)ar + (By) + (ab y)ar] 


= sles + Bo +1 + (@B)12 + (ay) + (By )a1 + (ab Y)121] 


oe sl By ese (ois age (Ory aah (89 aa LOB) ii5| 


= sos + Bo + 2+ (@B)o2 + (@Y)22 + (BY)22 + (aBY)229] 


= : [-a1 + 61+ — (@B)u — (ay) + (BY) — (BY) 111] 


ai — Pr +1 — (AB) + (ay) — (By) — (ab y)111] 


a+ 6-1 + (a8) — (ey) — (BY) — (a8) 111] 


+ 


—a1 — i —" + (8B) + (ay) + (Py) — (@8Y)111] 
c= (7) 


4 
1 
4 
1 
4 
1 
4 
p 
C 
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This shows that 6; is confounded with (ay)11. 


Consider the contrast 


Toi + L121 — L112 — Lo22 
4 9 


which estimates 
Hoi + Hi2i — Hie — He22 1 


4 1! 
slo + Bo +91 + (a@B)12 + (ay) + (By)o1 + (@BY)121] 


ag + Bi +71 4+ (aB)ar + (ayjar + (By) + (eBy)er] 


= slo + By +42 + (OB)u + (@Y)12 + (BY)12 + (BY) 112] 


= sos + Bz + Y2 + (XB) 22 + (aY)22 + (BY)22 + (BY) 229] 
= Flor +b +71 — (a8) — (orn + (Ba) — (81) an 
9 slo — Pity — (a8)u + (ay) — (BY) — (a8) 111] 
— Flo + Br 74 + (08) — (ayn — (By)n1 — (089) 

1 


= gio —Pp-14+ (a6) + (ay)u + (yu — (ab y)111] 


= — (a8 )u. 
This shows that 7 is confounded with (a@)11. 
In summary, the alias pairs are [A, BC], |B, AC], [C, AB]. 
10. (a) The design table is given as follows: 
A B C D E=(ABCD) 
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The set of 15 aliased pairs is 
[A, BCDE], |B, ACDE), |C, ABDE], |D, ABCE], |E, ABCD], 
[AB, CDE], |AC, BDE], |AD, BCE], |AE, BCD], [|BC, ADE], 
[BD, ACE], |BE, ACD], (CD, ABE], (CE, ABD], |DE, ABC]. 
(b) The design table is given as follows: 


A BC D E(=ABC) F=(BCD) 
+ li 


| 
| 
l se Petpet 


The set of 15 aliased groups are 
A, BCE, DEF, ABCDF\,|B, ACE, CDF, ABDEF',|C, ABE, BDF, ACDEF 
D, ABCDE, BCF, AEF|,|E, ABC, ADF, BCDEF), |F, BCD, ADE, ABCEF 


AB, CE, ACDF, BDEF],|AC, BE, ABDF,CDEF\, (AD, BCDE, ABCF, EF 
AE, BC, DF, ABCDEF\, |BD, ACDE, CF, ABEF|,|BE, AC, CDEF, ABDF 


(c) The design table is omitted. There are (2’ — 4)/4 = 31 groups of four aliased 
effects. 


11. (a) The set of alias pairs are [A, BCDE], |B, ACDE], [C, ABDE], |[D, ABCE], 
[E, ABCD], [AB, CDE], [AC, BDE], [AD, BCE], [AE, BCD], [BC, ADE], 
[BD, ACE], [BE, ACD], [CD, ABE], [CE, ABD], [DE, ABC]. 


(b) With the data read into the data frame df, the sums of squares of the classes of 
aliased effects can be obtained by the command anova(aou(y~A*B*C*D*E, 
data=df)). It is not possible to test for the significance of the effects because 
there are no degrees of freedom for the error sum of squares. 
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CD, ABDE, BF, ACEF], (DE, ABCD, BCEF, AF], |ABD, CDE, ACF, BEF. 
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(c) Using the command given in the hint, the obtained ANOVA table is 


Residuals 


BHOQAWS 


A:B 
A:D 
A:E 
B:C 
B:D 
B:E 
C:D 
C:E 
D:E 


DF 


il 
ih 
1 
it 
1 
1 
i} 
1 
iL 
iL 
ik 
i 
1 
I 
1 


SS 
500.64 
0.14 
489.52 
185.64 
293.27 
1233.77 
13.14 
26.27 
21,59 
26.27 
43.89 
213.89 
50.77 
293,27 
0.14 


MS 
500.64 
0.14 
489.52 
185.64 
293.21 
1233.77 
13.14 
26.27 
21.39 
26.27 
43.89 
213.89 
50.77 
293.27 
0.14 


F 
3960.111 
1.000 
3481.000 
1320.111 
2085.444 
8773.444 
93.444 
186.778 
P21 
186.778 
ola. Vi 
1521.000 
361.000 
2085.444 


P 
0.010669 
0.500000 
0.010789 
0.017517 
0.013938 
0.006796 
0.065624 
0.046499 
0.051505 
0.046499 
0.035997 
0.016320 
0.033475 
0.013938 


It is clear that at significant level 0.05, all the effect except B, AD, BC are 


significant. 
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Polynomial and Multiple Regression 


12.2 The Multiple Linear Regression Model 


iF 


4, 


(a) py|x,,x2(12, 25) = 3.6 + 2.7 x 12+ 0.9 x 25 = 58.5 
(b) E(Y) = 3.6 + 2.7E(X,) + 0.9E(X2) = 3.64+2.7 x 10+0.9 x 18 = 46.8 


(c) When X, increases by one unit while Xj remains fixed, the expected change 
in Y is increasing by 2.7. 


(d) Clearly, 6; = 2.7, G2 = 0.9, and po = E(Y) = 46.8. 


(a) Since Cov(X1, X29) = E(X1X2)—E(X1) E(X2), we have E(X,X_) = Cov(X1, X2)+ 


E(X1)E(X2) = 804+-10x 18 = 260. Thus, E(Y) = 3.64+-2.7E(X1)+0.9E(X2)+ 
1.52 (X,X2) = 3.64 2.7 x 104+ 0.9 x 18+1.5 x 260 = 436.8. 
(b) Since (3 is the coefficient of the term X,X>2, comparing to the original model, 


there must be 63 = 1.5. Taking expectation in the centered model, there is 


E(Y) = Bo + fA E(X1 — wx,) + BoE (Xe — wx.) + P3E[(X1 — wx,)(Xe - x,)] 
= Bot 63Cov(X1, Xo), 


which gives 69 = E(Y) — 83Cov(X1, X2). From the information in part (a), 
Bo = E(Y) = Gb3Cov(X1, Xo) = 436.8 — 1.5 x 80 = 316.8. 


(a) The rate of change of the regression function at x is 


djty|x (2) 


= —3.2+ 1.42. 
Ae + x 


Thus, the rate of change at x=0, 2, and 3 are —3.2, —0.4, and 1, respectively. 


(b) 8 is the coefficient of the x? term, thus 82 = 0.7. In the centered model, the 
coefficient of x is 8; —26244x, which must be —3.2. Thus, 6; = —3.2+2f.4x = 
~3.2+2x0.7x2 = —0.4. By = py|x (px) = -8.5-3.2x 240.7% 2? = -12.1. 


(a) The scatterplots for (x, y) and (log(x), log(y)) are given on the next page. The 
scatterplot of (log(x), log(y)) suggests a linear relation. 
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(b) Using the log-transformed data, we can fit the linear regression model as 


log y = 7.1458 — 0.5118 log(z). 


(c) When the per capita income is 4000, the infant mortality rate is predicted as 


y = exp(7.1458 — 0.5118 log(4000)) = 18.19. 


5. (a) The scatterplots for (t,y) and (t,log(y)) are given on the next page. The 
scatterplot of (t,log(y)) suggests a linear relation. 
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(b) Using the log-transformed data, we can fit the linear regression model as 
log y = 5.9732 — 0.2184t. 
Thus, to predict the the bacteria count Y at time t, we can use 


y = exp(5.9732 — 0.2184t). 


6. (a) The scatterplots for (x,y) and (1/z,y) are given on the next page. The scat- 
terplot of (1/x, y) suggests a linear relation. 
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(b) Using the transformed data, we can fit the linear regression model as 
Y = 2.979 — 6.935 /zx. 
(c) When the wind speed is 8 miles per hour, the current produced is predicted as 


y = 2.979 — 6.935/8 = 2.112125. 


12.3 Estimating, Testing, and Prediction 


1. (a) Using the R commands GSP = read.table(”GasStatPoll.tat”, header = T); 
attach(GSP); fit =ln(MTBE~GS+WS+T), we can get the fitted model as 


MT BE = 10.4576978 — 0.0002323G'S — 4.7198342W S + 0.0033323T.. 
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The R? value is reported as 0.8931 and the p-value for the model utility test 
is 0.02064, thus the model is useful for predicting MTBE concentrations. 


(b) The fitted value and the residual corresponding to the first observation are 
5.9367 and -1.0367, respectively. 


(c) To test the normal assumption, we use the commands shapiro.test(rstandard(fit)), 
which returns a p-value of 0.9256, and this suggests that the normal assumption 
is not contradicted by the data. The QQ plot is produced by the commands 
qqnorm(rstandard(fit)); qqline(rstandard(fit)) and is given below. 


Normal Q-Q Plot 


Sample Quantiles 


=1:5) =1:0 -0.5 0.0 0.5 1.0 1.5 


Theoretical Quantiles 


The QQ-plot is consistent with the p-value result. To test the homoscedasticity 
assumption, we use the commands r1=lm/(abs(rstandard(fit))~ poly(fitted (fit), 2)); 
summary(r1) and we get a p-value of 0.7506, which suggests that the ho- 
moscedasticity assumption is not contradicted by the data. The standardized 
residual versus the fitted value is given by the command 


plot(fitted(fit),abs(rstandard(fit))) and the plot is shown on the next page. 
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This plot is consistent with the p-value result. 

(d) The command summary/(fit) returns the p-value for testing the significant of 
each predictor. Only the predictor “Wind Speed” is significant at level 0.05 
(with a p-value of 0.026). 

(ce) The command confint(fit) returns the 95% confidence intervals for the intercept 
and GS, WS, and T are respectively (1.295, 19.620), (-0.0022, 0.0017), (-8.515, 
-0.925), and (-0.1707, 0.1774). 

2. (a) Using the R commands fit =lm(y~x1+22+23), we can get the fitted model as 


y = 17.5238 + 0.715671 + 1.295322 — 0.152123. 


The adjusted R? value is reported as 0.8983 and the p-value for the model 
utility test is 3.016 x 10~°, thus the model is useful for predicting stackloss. 


(b) The p-value for testing the significance of x3 is 0.34405, thus it is not a useful 
predictor in the model. We use command fitR=lm(y~zr1+22) to fit the MLR 
model using only xl and x2, with no polynomial or interaction terms. The 
adjusted R? value is reported as 0.8986, very close to the adjusted R? value 
in part (a). This is consistent with the conclusion from the p-value that x3 is 
not useful. 


(c) To get the asked confidence interval, we use the following commands 


y=stackloss$stack.loss; c1=stackloss$Air. Flow; 
r2=stackloss$ Water. Temp; c3=stackloss$Acid. Conc; 
m1 = mean(x1); m2 = mean(x2); m3 = mean(x3); 
ti=r1-m1; 22=22-m2; t38=213-m8; fitR=lm(y~r1 +22); 
predict(fitR, data.frame(x1=65-m1, t2=20-m2), interval= “confidence”). 
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3. 


The CI is obtained as (16.72191, 21.62454). 


(d) To fit the MLR model based on second order polynomials for x1 and x2, as 
well as their interaction, we use the command 


fitF=lm(y~poly(x1,2,raw=T)+poly(x2,2,raw=T)+21:22). 
The fitted model is 


y = 17.19405-+0.6279321 —0.0385771°4 1.1231222—0.0662422?-+0.18757r122. 


To test the joint (i.e., as a group) significance of the two quadratic terms and 
the interaction, we use the command anova/(fitR, fitF’) and it returns a p-value 
of 0.07238. Thus, the two quadratic terms and the interaction are significant 
at level 0.05. 


(e) The scatterplot matrix for the data is given below. 
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From this figure, “Acid.Conc.” appear correlated with “stack.loss.” The high p- 
value for “Acid.Conc.” in part (a) is because that the effect of “Acid.Conc.” is 
explained by “Air.Flow” and “Water.Temp.” 


(a) The estimated regression model is 


G = 70.94 + 5.18 x 10-°x, — 2.18 x 10-°xre + 3.382 x 10-223 
— 0.301124 + 4.893 x 107-225 — 5.735 x 107325 — 7.383 x 107-827 


and R24, = 0.6922. The p-value for the model utility test is 2.534 x 107'°. The 
model is useful for predicting life expectancy. 
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(b) Using the given command get h2 and the command anova(h1, h2), gives a p- 
value of 0.9993 for testing the joint significance. Thus, the variables “Income,” 
“Tlliteracy,” and “Area” are not significant at level 0.05. 


(c) The R? values for the full and reduced model are 0.7362 and 0.736, respectively. 
This is because the variables “Income,” “Illiteracy,” and “Area” are not sig- 
nificant. The R24; values for the full and reduced model are 0.6922 and 0.7126, 
respectively. This is because in the reduced model, the almost identical R? is 


adjusted for fewer predictors. 


(d) To test the normal assumption, we use the commands shapiro.test(rstandard(h2)) 
which returns a p-value of 0.5606, and this suggests that the normal assumption 
is not contradicted by the data. The QQ plot is produced by the commands 
qqnorm(rstandard(h2)); qqline(rstandard(h2)) and is given below. 


Normal Q-Q Plot 


Sample Quantiles 


-2 


Theoretical Quantiles 


The QQ-plot is consistent with the p-value result. To test the homoscedasticity 
assumption, we use the commands r1=lm(abs(rstandard(h2))~poly(fitted(h2),2)); 
summary(r1) and we get a p-value of 0.7113, which suggests that the ho- 
moscedasticity assumption is not contradicted by the data. The standardized 
residual versus the fitted value is given by the command 


plot(fitted(h2),abs(rstandard(h2))), and the plot is shown below. 
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abs(rstandard(h2)) 


fitted(h2) 


This plot is consistent with the p-value result. 


(e) The fitted value for CA is given by fitted(h2)/5/, which returns 71.796. To 
find a prediction for the life expectancy in the state of California with the 
murder rate reduced to 5, we use predict(h2, data.frame(Population =21198, 
Income =5114 , Illiteracy= 1.1, Murder =5, HS.Grad =62.6, Frost= 20, Area 
=156361)) and the returned value is 73.386. To get a 95% prediction in- 
terval for life expectancy with the murder rate reduced to 5, we use pre- 
dict(h2, data.frame (Population =21198, Income =5114 , Iliteracy= 1.1, Mur- 
der =10.8, HS.Grad =62.6, Frost= 20, Area =156361), interval=”prediction” ) 


and the prediction interval is given as (71.62966, 75.14321). 


4. (a) The fitted model is 


y = 44.97556 + 4.339392 — 0.548872? — 0.055192". 


The adjusted R? is 0.9648 and the p-value for the model utility test is 1.025 x 
10-™, thus the model is useful. The p-value for testing the significance of x, 
x? and x° are 2.87 x 107°, 5.11 x 10~1°, and 4.72 x 10~°, respectively. Thus 
they are all significant at level 0.01. 


(b) The scatterplot of the data with the fitted curve superimposed is given be- 
low. This plot shows that the fit provided by the 3rd order polynomial is 


satisfactory. 
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(c) The command plot(hc3, which=1) produces the plot below, which suggests 
that the fit can be improved. 
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Fitted values 
Im(y ~ x + I(x42) + 1(x43)) 


For the 5th order polynomial model, the adjusted R? is 0.9847. When test- 
ing the significance of the coefficients at level 0.01, we found that only the 
coefficient of x? is not significant with a p-value of 0.216. 


(d) Ignoring the term x”, we fit a model by hc52=lm(y~r4I (x?) +1 (x*)+1(x°)); 
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summary(hc52). The p-values show that all the coefficients are significant at 
level 0.01. The adjusted R? is 0.984, which increased a little compared to part 
(a). The scatterplot of the data with the fitted curve superimposed is given 
below. Compared to the plot in part (b), the plot in part (d) shows that the 
fit increases at some points. 


(a) The commands return the R? value of 0.962 and the adjusted R? value of 0.946. 
The p-value for the significant test is 2.403 x 10~°, thus it is significant at level 
0.01. 


(b) To test the joint significance of the quadratic and cubic terms, we use the com- 
mands prR = Im(y~x); anova(pr3, prR) and the returned p-value is 9.433 x 
10-°. Thus, the quadratic and cubic terms are jointly significant at level 0.01. 


(c) To test the the joint significance of the polynomial terms of orders four through 
eight, we use the commands pr8=Im(y~poly(z, 8, raw=T)); anova(pr8s, pr3). 
The commands return the p-value as 0.7917. Thus, the polynomial terms of 
orders four through eight are not jointly significant at level 0.01. From the fit 
pr8, we have the R? value 0.9824 and the adjusted R? value 0.912. Compared 
to those in (a), R? is somewhat bigger but the adjusted R? is somewhat smaller, 
consistent with the non-significance of the higher order polynomial term. 


(d) The following figures show the scatterplot superimposed with the 3rd degree, 
8th degree and 10th degree polynomial fits. 
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We can observe that as the degree of the polynomial increases, the curve fits 
the data better, but the curve becomes less and less smooth. 


6. From (12.3.5), there is SSR = R?SST, and SSE = SST — SSR = (1— R?)SST. 
Thus, 


p_ MSR SSR/k R°SST/k R2/k 


" MSE SSE/(n—k—1) (1—R2SST/(n—k—1) (1—RX/(n—k—-]) 


as was to be proved. 


12.4 Additional Topics 


1. (a) By running the given commands, the standard errors of the slopes estimate 
obtained from the OLS and WLS are respectively 0.2209 and 0.1977. 


(b) By running the given commands, we have the standard deviation of the es- 
timated slopes in OLS and WLS are respectively 0.5545655 and 0.1916188. 
Compare to the results in part (a), we see that OLS underestimates the vari- 
ability of the estimated slope while WLS analysis estimates it correctly. 


2. (a) We run the commands 


y=stackloss$stack.loss; r1=stackloss$Air. Flow; 
r2=stackloss$ Water. Temp; «3=stackloss$Acid. Conc.; 
t1=21-mean(x1); t2=22-mean(x2); t38=x8-mean(z3); fit = ln(y~x1+22428); 
r1=lm(abs(rstandard(fit))~poly(fitted(fit),2)); summary(r1). 
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The code gives us the p-value of 0.0377 for the model utility test, suggesting 
violation of the homoscedasticity assumption. The command plot(fitted(fit), 
abs(rstandard(fit))) gives us the plot below, which is consistent with the formal 


test. 
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(b) To fit the weighted least square model, we run the following commands 


abse=abs(resid(fit)); yhat=fitted(fit); efit=lm(abse~yhat); 
w=1/fitted(efit)**2; fitw = ln(y~r1+22428, weights=w); summary(fitw). 


The p-value of the model utility test is reported as 1.868 x 107°. 


(c) The 95% CI for the regression parameters are summarized in the table below. 


Parameter OLS WLS 

Intercept | (16.03, 19.02) | (16.04, 18.82) 
xl (0.43, 1.00) (0.43, 0.94) 
x2 (0.52, 2.07) | (0.41, 1.71) 
xo (-0.48, 0.18) (-0.29, 0.19) 


Clearly, WLS gives shorter Cls. 


3. (a) The plot of the residuals versus the predicted values is given below. 
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On the basis of this plot, the homoscedasticity assumption is suspicious. To 
perform a formal test, we run the commands 


r1=lm/(abs(rstandard(edu.fit))~poly(fitted(edu.fit),2)); summary(r1) 
and the model utility test gives a p-value of 0.0002524, suggesting violation of 


the homoscedasticity assumption. 
(b) To fit the weighted least square model, we run the following commands 
abse=abs(resid(edu.fit)); yhat=fitted(edu.fit); efit=lm(abse~yhat); 
w=1/fitted(efit)**2; fitw = lIm(Y~X14+X2+X8, weights=w, data=edu); 
summary(fitw). 


(c) The 95% CI for the regression parameters are summarized in the table below. 


Parameter OLS WLS 
Intercept | (-804.5472, -308.5889) | (-564.0155, -153.4258) 
X1 (0.0490, 0.0957) (0.0426, 0.0858) 
X2 (0.9187, 2.1855) (0.4770, 1.5470) 
X3 (-0.1077, 0.0992) (-0.0634, 0.1028) 


4. From the model (12.4.9), the cell mean for cell (i, 7) is 
Hig = Ho + + Bj, for t=1,--- a; j=l,--+ ,d. 


According to the regression model (12.4.11), the cell mean can also be calculated 


as 
lig = Bo + BA + B?, for 7=1,---,a-1,; j=l1,---,b-1 
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Hag = Bo — BY —---— Bey + BP, for j=l,---,b-1 
pis = Bo + BY — BP — +++ Bey, for ¢=1,-++,a-1 
and 
Hab = Bo — BY — ++ — Baa — BE — ++ — Bea. 


Thus, we have a system of equations 


lio toi + 8; = Bo t+ BA +87, for i=1,---,a-1; j=1,---,b-1 


tio + da + B3 = Bo — BA - ++ A+B, ior Ja lget 01 
fio + 04 + By = Bo + Bf — BY —-++— Byy, for i=1,---,¢-1 
and 
pot OP le ir He Pr 
Summing up the ab equations and using the condition that S°*.,a; = 0 and 


see 6; = 0, we have abjio = abfo, that is 89 = po. 
Fixing 7 € {1,--- ,b— 1}, summing up the a — 1 equations 
lio tai + 8; = Bo + BA + BF, for 1=1,--",a-1 
and 
lio + Oa + Bj = Bo — Bf -— +++ at Be, 


and using )>/, a; = 0, we have af; = aG?, that is 8B? = B;, for j = 1,--- ,b-1. 
Since et 6; = 0, we have 


By = Br —-+- — Bp =— BP ----— BR. 
Similarly, it follows that BA = a;, for i=1,---,a—1landa,=—6A—---— BA,. 
5. (a) We can define an indicator variable X = 1 or —1 depending on whether the 


observation comes from route 1 or 2. 


(b) The regression parameters 39 and (£, related to the population means pi, and 
2 via the relations fy = 89 + 6; and p21 = Bo — 4. 


(c) To compare the p-value from the model utility test with that from the two- 
sample t-test, we use the commands 
y=dd$duration; z=rep(1, length(y)); x[which(dd$route==2)]=-1; 
summary (lm(y~x)); t.test(y~dd$route, var.equal=T). 
Both give the p-value of 6.609 x 10~1°. 


(d) The Levene’s test returns the p-value of 0.08981 and the regression test gives 
the p-value of 0.07751. 
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(ce) For the WLS analysis, using the commands given in the hint, a p-value < 
2.2 x 10~'° is returned; for the t-test without the equal variances assumption, 
we use t.test(y~dd$route) and it returns p-value of 1.626 x 107'°. Thus, both 
methods suggest that the two population means are significant different at 
a= 0,05, 


6. We use the following code 


Ri1=rep(0, length(edu$R)); R1[which(edu$R==1)/=1; R1[which(eduS$R==4)]=-1; 
R2=rep(0, length(edu$R)); R2[which(eduSR==2)/=1; R2[which(eduS$R==4)]=-1; 
R3=rep(0, length(edu$R)); R3[which(eduSR==3)/=1; R8[which(eduS$R==4)]=-1; 
fit = lm(Y~X14+X24X384+R14+R2+R8, data=edu) 
abse=abs(resid(fit)); yhat=fitted(fit); efit=lm(abse~yhat); w=1/fitted (efit) **2; 
fitFw = lm(Y~X14+X24+X38+R14+R2+RKR3, weights=w, data=edu); 
fitRw = lm(Y~X14+X24+X8, weights=w, data=edu); anova/(fitF wu, fitRw). 


The given p-value is 0.07833, which suggests that the variable “Region” is significant 
at a 0.1 level of significance. 


7. The commands show that the final model includes five predictors: mmax, cach, 
mmin, chmax, and syct. The p-values are listed as: mmax 1.18 x 1071°, cach 
5.11 x 10-®, mmin 4.34 x 107, chmax 3.05 x 1071', and syct 0.00539. The p-value 
for the model utility test is < 2.2 x 10-'°. 


8. Let Population, Income, Illiteracy, Murder, HS.Grad, Frost, and Area be x; to 27, 
respectively. 


(a) The full model is 


g = 70.944 5.18 x 10-°x, — 2.18 x 107°x. + 3.382 x 107223 
— 0.3011¢4 + 4.893 x 10-225 — 5.735 x 10-826 — 7.383 x 10-8 az, 


and the model without Area (27) is 


G = 70.99 + 5.188 x 107°a, — 2.444 x 10-°ae + 2.846 x 107725 
— 0.301824 + 4.847 x 107225 — 5.776 x 107326. 


(b) In the model from last step, the variable Illiteracy (#3) has the largest p-value 
at 0.9340, thus we update the model by h=update(h, . ~ . -Illiteracy), and 
get the model as 


G = 71.07 + 5.115 x 10-°2, — 2.477 x 10-°2. — 0.324 
4 A476 x 10° “x2 — 5.910 « 10-75. 
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Now the variable Income (2) has the largest p-value at 0.9153. We update 
the model by h=update(h, . ~. -Income) and the obtained model is 


g = 71.03 + 5.014 x 10-°x, — 0.300124 + 4.658 x 10-725 — 5.943 x 10-325. 


Now, every variable has p-value less than 0.1 and we stop removing variables. 


The R? for the final model is 0.736, very close to that of the full model (0.7362). 
This verifies that the removed variables have very little contribution to the 


model. 


9. We use the commands library(leaps); vs.out = regsubsets(Life.Erp~ . , nbest=8, 
data=st); plot(vs.out, scale=“Cp”); plot(vs.out, scale= “adjr2”); plot(vs.out, scale= “bic” ) 
to create the plots for ordering the models, using C,, adjusted R?, and BIC, respec- 
tively. The plots are given below. 
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We see that all the three selection criteria give us the same best set of variables — 
that is, removing Income, Illiteracy, and Area. We also notice that this result is 


the same as that of Exercise 8. 


10. (a) We use the commands library(leaps); vs.out = regsubsets(calories~ . , nbest=8, 
data=uscer); plot(vs.out, scale=“Cp”); plot(vs.out, scale= “adjr2”); plot(vs.out, 
scale= “bic”) to create the plots for ordering the models, using C,, adjusted 
R?, and BIC, respectively. The plots are given below. 
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We see that the adjusted R? criteria gives the full model as the best, while C, 
and BIC give the best model as the model without the “sodium” predictor. 


(b) The plot is given below. According to the rule of thumb, we identify observa- 
tions 31 and 32 in the uscer data set as influential. 
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(c) We remove the influential observations by command uscer1 = uscer/-c(31,32),/, 
and then run the commands vs. out = regsubsets(calories~ . , nbest=3, data=uscer1); 
plot(vs.out, scale=“Cp”); plot(vs.out, scale= “adjr2”); plot(vs.out, scale= “bic” ) 
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to create the plots for ordering the models, using C,, adjusted R?, and BIC, 
respectively. The plots are given below. 


protein 4 
fat + 
sodium — 
fibre + 
carbo — 
sugars — 


(Intercept) — 
potassium — 


adjr2 
esocoeo esesessessosesose9s9o 
OAWNNADOWBRNNOO00M0M MO 


protein 4 
fat + 
sodium 4 
fibre + 
carbo - 
sugars — 


(Intercept) — 
potassium — 


Copyright ©) 2016 Pearson Education, Inc. 


12.4 Additional Topics 231 


ial 


12. 


bic 


protein + 
fat - 
sodium 4 
fibre + 
carbo + 
sugars — 


(Intercept) — 
potassium — 


We see that all the three criterion give the same model as final model — that 
is, the model without “sodium,” “fiber,” and “potassium.” The final model 
seems reasonable. 


(a) The p-values for x1, 22, 3 and x4 are 0.0708, 0.5009, 0.8959, and 0.8441, 
respectively. Thus, no variable is significant at level 0.05. The R? value of 
0.9824 and the p-value of 4.756 x 10~" for the model utility test suggest that 
at least some of the variables should be significant. This is probably due to 
multicollinearity. 


(b) Using command vif(hc.out), the variance inflation factors for each variable are 
respectively 38.49621, 254.42317, 46.86839, and 282.51286 and they indicate 
that multicollinearity is an issue with this data. 


(c) a4 has the highest variance inflation factor and we remove it from the model 
by the command hc1.out=update(hc.out, .~. -x4). In the new model, x1 and 
x2 are both significant at level 0.05. The R? value and the adjusted R? are 
very close to those of the full model. 


(d) Starting from the full model, we first remove x3 because it has the highest 
p-value. Then we remove «4, getting a model with only x1 and x2 — both 
variables have p-values less than 0.15. The final model has adjusted R? of 
0.9744 and, compared to that of the full model (0.9824), there is not much 
loss. 


(a) The scatterplot matrix is given below. 
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The Salary vs Total scatterplot suggests that increasing teacher salary will 
have a negative effect on student SAT scores. 


Using the command summary(lm(Total~Salary, data=sat)), we fit a least 
squares line through the scatterplot and the value of the slope is -5.540, veri- 
fying our observation from the scatterplot. 


Using command 


summary (lm(formula = Total ~ Salary + ExpendPP + PupTeachR + 
PercentEll, data = sat)), 


we fit the MLR model for predicting the total SAT scores in terms of all 
available covariate. The coefficient for Salary now is 1.6379. This suggests 
that increasing teacher salary, while keeping all other predictor variables the 
same, appears to have a positive effect on student SAT scores. This is not 
compatible with the answer in part (a). The reason is because the predictor 
“Salary” has strong linearly dependencies with the predictor “ExpendPP” and 
“PercentEll,” as demonstrated in the scatterplot matrix. 


(c) Only the predictor “PercentEIl” has a p-value of less than 0.05, thus it is the 


(d) 


only significant predictor at 5% level. 


R? = 0.8246, Rea = 0.809, and the p-value for the model utility test is 
< 2.2 x 10-1. These values are consistent with finding a significant variable 


in part (c). 
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(ec) Using the command vif(lm(Total~ Salary+ExpendPP+Pup TeachR+ PercentEll, 
data=sat)), the variance inflation factor for the variables Salary, ExpendPP, 
PupTeachR, and PercentEIll are respectively 9.217237, 9.465320, 2.433204, and 
1.755090. They suggest multicollinearity of the predictors. 


13. (a) The commands give R? = 0.3045, R24, = 0.2105, and the p-value for the model 
utility test as 0.01589. Thus, the model utility is significant at 0.05 level. 


There is no predictor significant at 0.05 level. 


(b) Using command vif(if.out), the variance inflation factors for upwid, lowid, 
bklen, tarsus, and sternum are respectively 13.951765, 14.188745, 1.924634, 
1.326368, and 1.203584. These values suggest that multicollinearity is an issue 
with this data. The side effect of multicollinearity is the phenomenon of a 
significant model utility test when all predictors are not significant. 


(c) We use the commands library(leaps); vs.out = regsubsets(wt~ . , nbest=8, 
data=If); plot(vs.out, scale=”Cp”); plot(vs.out, scale=”adjr2”); plot(vs.out, 
scale="bic”) to create the plots for ordering the models, using C,, adjusted 
R?, and BIC, respectively. The plots are given below. 


(Intercept) 4 
upwid — 
lowid 4 
bklen + 
tarsus + 

sternum + 
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(Intercept) - 
upwid — 
lowid 4 
bklen + 
tarsus + 

sternum + 


upwid — 
lowid 
bklen + 
tarsus + 
sternum — 


(Intercept) 4 


We see that all the three selection criteria give us the same best set of variables 
— that is, lowid and tarsus. 


(d) Using the commands /f.R=lm(wt~ lowid + tarsus , data=If); summary/(If.R), 
R? = 0.2703 and R24, = 0.2338, and the p-value for the model utility test 
is 0.0018. Compared to the model obtained in part (a), the reduced model 


has a somewhat smaller R?, somewhat larger R2,,, and smaller p-value for the 


model utility test. The new variance inflation factors are both equal to 1.0098. 
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Multicollinearity is not an issue now. 


14. (a) We use the commands fit =glm(y~z,family=binomial()); summary(fit) to get 


the fitted model as 
6-3-3367 +0.81392 


Y= Te 3:3367+0.81308" 


The p-value for testing the significance of the stress variable is 0.00786, thus 
the variable is significant at level 0.05. 


(b) To estimate the probability of failure at stress level 3.5, we use the com- 
mand predict (fit, list(x=3.5), type= “response” ), which returns the probability 
as 0.380359. 


(c) To fit the logistic regression model that includes a quadratic term of the stress 
variable, we use the command fit =glm(y t4+1(x**2), family=binomial()); sum- 
mary(fit) and the fitted model is returned as 


C7 2-69440+0.468342+0.04236a? 


~ 1+ e—2-69440+0.46834a +0.042362? : 


Using the command confint/(fit), we get 95% CI for the interception, the co- 
efficients for z and x? as (-12.108, 4.399), (-3.357, 5.068), and (-0.494, 0.526). 
According to the Cls, at 5% significant level, we cannot reject the hypothesis 
that the coefficient of the quadratic component is zero. 
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Chapter 13 


Statistical Process Control 


13.2. The X Chart 


1. (a) The probability that a 2.70 X chart issues a false alarm is 


vn vn 


The corresponding ARL is 1/(26(—2.7)), which can be calculated by the com- 
mand 1/(2*pnorm(-2.7)), and the result is 144.2. 


The probability that a 3.10 X chart issues a false alarm is 


eee (x ra 2.7 co ae (x es 2.7%.) = 26(—2.7). 


= 0 = 0 
Pup. (x < fg — 1 | TP (x > po + 21) = 26(—3.1). 


vn vn 
The corresponding ARL is 1/(26(—3.1)), which can be calculated by the com- 
mand 1/(2*pnorm(-3.1)), and the result is 516.7. 


(b) From the symmetry of the standard normal distribution, for any real value z, 
®(—z) = 1— (2). Thus, for A = —|AI, (13.2.13) becomes 


5(-3 + ValAl) +1- 0(3 + VnlAl) = 1- 63 yalAl) +1 - [1 - 8(-3 — VmlA)) 
= 8(-3 — VnlAl) + 1- 63 — val), 


and this finishes the proof. 


(c) We use the commands Delta=1; n=c(3:7); p=1+pnorm(-3-sqrt(n) *Delta)- 
pnorm(3-sqrt(n)*Delta); p; 1/p. The probability of an out-of-control signal 
and the corresponding ARL are listed as 


n 3 4 5 6 7 
Probability 0.1024092 0.1586555 0.2224540 0.2909847 0.3615763 
ARL 9.764752 6.302963 4.495312 3.436606 2.765668 


2. We load the data by commands SSL=read.table(“SqcSyringeL.tat”, header =T); 
data = qcc.groups(SSL$z, SSL$sample). 
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(a) We use the commands require(qcc); qcc(data/1:15,],type= “xbar” ,newdata=data/16:47,/) 
to construct the 30 X chart with the standard deviation estimated by a2. The 
chart is given below. 


xbar Chart 
for data[1:15, ] and data[16:47, ] 


Calibration data in data[1:15. ] New data in data[16:47, ] 


Group summary statistics 


° H 
LEU AP UTS LGU Ue UDELL LL TLL aT 
13579 12 15 18 21 24 27 30 33 36 39 42 45 


Group 
Number of groups = 47 
Center = 5.813333 LCL = 4.332876 Number beyond limits = 11 
StdDev = 1.103468 UCL = 7.293791 Number violating runs = 25 


The chart suggests that the adjustment made after the 32nd sample brought 
the subgroup means within the control limits. Even so, the adjustment did 
not bring the process back in control since there are more than eight points 
on the same side of the central line. 


(b) We use the commands gcc(data/c(1, 3:15),/, type= “xbar”, newdata=data/16:47,]) 
to construct the 30 X chart with the standard deviation estimated by G2, with 
the second point deleted. The chart is given below. 


xbar Chart 
for data[c(1, 3:15), ] and data[16:47, ] 


Calibration data in datafc(1, 3:15). ] New data in data[16:47, ] 


wo 


Group summary statistics 


en pensegenscsesh ia Sec ees occ acents atest asses cnesececeetessei ss LeL 
LLL LLL SL TEL LLL SLL 


1468 11 14 17 20 23 26 29 32 35 38 41 44 47 


Group 
Number of groups = 46 
Center = 5.928571 LCL = 4.470087 Number beyond limits = 9 
StdDev = 1.08709 UCL = 7.387056 Number violating runs = 21 


Comparing to part (a), the conclusions do not change. 
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(c) To construct the 30 X chart with the standard deviation estimated by 61, we 
use the following command qcc(data/1:15,/,type= “char” newdata=data/16:47,]/, 
std.dev= “UWAVE-SD”), and the chart is given below. 


xbar Chart 


for data[1:15, ] and data[16:47, ] 
Calibration data in data[1:15, ] New data in data[16:47, ] 


ICL 


Group summary statistics 


° 
IPAS LLU Ur We LULA ay Wh LoL oa TELLER SUE a oUsE ee ol cL ALOU WLLL va 
#13) 5: 7 9) A215 18) 21 24 27 30: 33: 36: 39 42 45 


Group 
Number of groups = 47 
Center = 5.813333 LCL = 4.322005 Number beyond limits = 10 
StdDev = 1.111571 UCL = 7.304662 Number violating runs = 25 


With the second point deleted, the command is 
gcc(data[c(1, 3:15),],type= “cbar” newdata=data/16:47,], std.dev=“UWAVE-SD”), 
and the chart is given below. 


xbar Chart 


for data[c(1, 3:15), ] and data[16:47, ] 
Calibration data in datafo(1, 3:15), ] New data in data[16:47, ] 


sk Tae 7 ell jee 


8 
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= e 
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Q . ICL 
3 
Lo} 

one 

oonenene nent nen nnne enn nent neeneneeneeneeeceeee LCL 


TT 
1468 11 14 17 20 23 26 29 32 35 38 41 44 47 


Group 
Number of groups = 46 
Center = 5.928571 LCL = 4.455374 Number beyond limits = 7 
StdDev = 1.098057 UCL = 7.401769 Number violating runs = 21 


By comparing these charts to those in parts (a) and (b), we can see that all 
the conclusions are the same. 


(d) We use the commands 


r=SSL8x/100+4.9; sum(x[161:235]>=4.92 & x[161:235]/<=4.98) 
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to get the number of measurements within the limits. The commands returned 
74 and, since there are 75 measurements after the adjustment, therefore the 
process yield after adjustment is 98.67%. 


3. We load the data by commands SCV=read.table(“SqcCoolVisc.tat”, header=T); 
r=SCV $x. 


(a) We use the command gqcc(xz, type= “rbar.one”) to get the 30 X chart with the 
center and standard deviation computed from the entire data set, and it is 
shown below. 


xbar.one Chart 
for x 

be ee ° 

o 

ie! ee 

o f 

eo*% @ | 

i Vi i 
a uae eae Papago ecee esa sesgea cseaserstassesesscessrerscssssteos UCL 
g = 4 i Oe 
> = 
E ° e . \ é 
= * 2 4* Lew 
2 24 et ie cL 
Qa 
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3 he @ 

5 . \ \ 

No] .-----------------------------------------)p--|f 22----1---- ¥e4---1LCL 

e e ee 
vf 
ee 
aa | 
i ° 
6060 Gt i. i Oi J 


1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 


Group 
Number of groups = 50 
Center = 2.8144 LCL = 2.569611 Number beyond limits = 18 
StdDev = 0.08159647 UCL = 3.059189 Number violating runs = 25 


There are 18 points, marked in red, falling outside the control limits. There 
are 11 points, marked in yellow, suggesting an out-of-control state according 
to the General Electric supplemental rules. 


(b) We use the command gcc(x/1:25/, type=“xbar.one”, newdata=2/[26:length(x)/) 
to get the 30 X chart with the center and standard deviation computed from 
the first 25 observations, taken when the process is believed to be in control, 
and it is shown on the next page. 
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xbar.one Chart 
for x[1:25] and x[26:length(x)] 
Calibration data in x[1:25] New data in x[26:length(x)] 
e i 


SJ 
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1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 
Group 
Number of groups = 50 
Center = 2.9936 LCL = 2.691073 Number beyond limits = 18 
StdDev = 0.1008422 UCL = 3.296127 Number violating runs = 29 


There are 18 points, marked in red, falling outside the control limits. There 
are 12 points, marked in yellow, suggesting an out-of-control state according 
to the General Electric supplemental rules. 


4. We load the data by commands SL P=read.table(“SqcLaborProd.tat”, header=T); 
=O LP Oe: 


(a) We use the command gcc(x/1:30], type= “xbar.one”, newdata=2[31:length(x)/) 
to get the 30 X chart with the center and standard deviation computed from 
the first 30 observations, and it is shown below. 


xbar.one Chart 
for x[1:30] and x[31:length(x)] 
Calibration data in x[1:30] : New data in x[31:length(x)] 
H e 
a e fi 
5 
= 
oO 
£ 
E 
a 
Qa 
3 
10} 
LCL 


TT 
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 


Group 
Number of groups = 48 
Center = 6.746667 LCL = 1.491714 Number beyond limits = 8 
StdDev = 1.751651 UCL = 12.00162 Number violating runs = 3 
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13.3 


The graph shows that, during the calibration period, there are two points, 
marked in yellow, suggesting an out-of-control state according to the Gen- 
eral Electric supplemental rules. After the calibration period, there are eight 
points, marked in red, falling outside the control limits. Thus, the process 
is not in control either during the calibration period or after the calibration 
period. 


(b) We use the command 


gec(z[c(1:30)[-c(12,13)]], type=“xbar.one”, newdata=2/31:length(x)/) 

to get the 30 X chart with the center and standard deviation computed from 
the first 30 observations, with the 12th and 13th daily measurements removed, 
and it is shown below. 


xbar.one Chart 


for x[c(1:30)[-c(12, 130 and x[31:length(x)] 
Calibration data in x{c(1:30)[-c(12, 13 ; New data in x[31:length(x)] _ 


Group summary statistics 


(ase age lauiseeeinae eee ae _ 
INARA EEA OA pA nanan Annan nanan arr! 
13579 12 15 18 21 24 27 30 33 36 39 42 45 


Group 
Number of groups = 46 
Center = 6.821429 LCL = 1.275723 Number beyond limits = 8 
StdDev = 1.848568 UCL = 12.36713 Number violating runs = 1 


The graph shows that, during the calibration period, the process is in control. 
After the calibration period, there are eight points, marked in red, falling out- 
side the control limits. Thus, the process is not in control after the calibration 
period. 


The S and R Charts 


1. We load the data by commands SSL=read.table(“SqcSyringeL.tat”, header =T); 
data = qcc.groups(SSL$z, SSL$sample). We use the commands 


require(qcc); gcc(data/1:15,],type= “S” ,newdata=data/16:47, |) 


to construct the S chart, and the command 
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gcc(data/1:15,],type= “R” ,newdata=data/16:47, |) 
to construct the R chart. These charts are given below. 


S$ Chart 


for data[1:15, ] and data[16:47, ] 
Calibration data in data[1:15.] jew data in data[16: 47, 1 
woneneneno ee eeenennen4 Ta a a a UCL 
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13579 12 15 18 21 24 27 3 33 36 39 42 45 
Group 
Number of groups = 47 
Center = 1.04486 LCL=0 Number beyond limits = 0 
StdDev = 1.111571 UCL = 2.182711 Number violating runs = 19 
R Chart 
for oat: 15, ] and data{16:47, ] 
Calibration data in data[1:15. ] New data in data[16:47. ] 
noveeoeecon a ee ee UCL 
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TLS aL Ca CT 
1357 9 12 15 18 21 24 27 30 33 36 39 42 45 


Group 
Number of groups = 47 
Center = 2.566667 LCL=0 Number beyond limits = 1 
StdDev = 1.103468 UCL = 5.427139 Number violating runs = 19 


Both charts show most points after the calibration period to be below the center 
line. It appears that the adjustment had no effect on the process variability. 


2. We load the data by commands $SD=read.table(“SqcSemicondDiam. tat”, header=T); 
attach(SSD); diameter=qcc.groups(x, lot). 
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(a) We use the command gqcc(diameter, type =“S”) to get the S chart and use 
gcc(diameter, type =“R”) to get the R chart, and they are shown below. 


Group summary statistics 


Group summary statistics 


S$ Chart 
for diameter 


UCL 


© 
hs 5 
CL 
N 
°o LCL 


Group 

Number of groups = 20 

Center = 2.758494 LCL=0 Number beyond limits = 0 

StdDev = 3.45726 UCL = 9.01071 Number violating runs = 0 

R Chart 
for diameter 
UCL 

= 
(2 
@o 
o 
vT 


i 3 a 7 A es Br We Pa ic Kraehe) 


Group 


Number of groups = 20 
Center = 3.9011 
StdDev = 3.458422 


LGL=0 Number beyond limits = 0 
UCL=12.74605 Number violating runs = 0 


Clearly, these charts suggest that the process variability is in control. 


(b) We use the command gcc(diameter, type = “xbar”, std.dev=“UWAVE-R”) to 
construct the 3o0 X chart with the standard deviation estimated by G2 given 
in (13.2.5). The chart is given on the next page. 
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xbar Chart 
for diameter 
wo 
© 
2 ®& 
oa 
io 
oO w 
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Qa — 
— 
2 
9° wD 
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1 a) 5 if Psa est. ib Saree ak) 
Group 
Number of groups = 20 


Center = 174.3422 LCL = 167.0058 Number beyond limits = 4 
StdDev = 3.458422. UCL=181.6786 Number violating runs = 0 


This chart shows that the process mean is out of control. 


(c) We use the command gcc(diameter, type =“xbar”, std.dev= “UWAVE-SD”) to 
construct the 3o0 X chart with the standard deviation estimated by o, given 
in (13.2.5). The chart is given below. 


xbar Chart 
for diameter 
w 
oo 
g 8 
3B 
8 
o w 
Pal ~ 
a 
£ 
£ 
2 2 
fee — 
=) 
2° 
9 wo 
oOo 


1 a) o if SO 13 plan ets ei 


Group 


Number of groups = 20 
Center = 174.3422 LCL = 167.0082 Number beyond limits = 4 
StdDev = 3.45726 UCL = 181.6762 Number violating runs = 0 


By comparing to the chart is part (b), the conclusion does not change. 
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3. Using the commands given in the book, the constructed ,?-based S' chart for the 
pistonrings data is shown below. 


Chi square S chart 


UCL 


sdv 
0.015 0.020 
| ! 
x 
x 


0.010 
i 


0.005 
! 


LCL 


0 10 20 30 40 


13.4 The p and c Charts 


1. To construct a 30 p chart using the first 24 samples in the orangejuice2 data frame 
as the calibration data, we use the following commands: 


data(orangejuice2); attach(orangejuice2); 


ly 29 


gcc(D[trial], sizes=size/trial/, type=“p”, newdata=D/Itrial], newsizes=size/[!trial]). 


The p chart is given on the next page. 
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p Chart 
for D[trial] and D[!trial] 
trial i 


New data in Djitri 
Sai gece entaeapeser se neeenaapacoueranraseyeaestene UCL 


CL 


Group summary statistics 


ete Stee LCL 


15 9 14 19 24 29 34 39 44 49 54 59 64 


Group 
Number of groups = 64 


Center = 0.1108333  LCL=0 Number beyond limits = 0 
StdDev = 0.3139256 UCL=0.2440207 Number violating runs = 1 


This chart suggests that the process remains in control. 


. We use commands data(pemanufact); attach(pcmanufact) to import data in the R 
session. We use commands require(qcc); gcc(x, sizes=size, type=“c”) to construct 
a 30 c chart, with A estimated from the entire data set, and the 30 wu chart is 
constructed by the command gqcc(z, sizes=size, type=“u”). The charts are given 
below. 


UCL 


15 


Group summary statistics 
10 


LCL 


1 3 5 if S71 | 13 As 1 19, 


Group 
Number of groups = 20 


Center = 9.65 LCL =0.3306653 Number beyond limits = 0 
StdDev = 3.106445 = UCL=18.96933 Number violating runs = 0 
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UCL 


Group summary statistics 


LCL 


Number of groups = 20 
Center = 1.93 LCL = 0.06613305 Number beyond limits = 0 
StdDev = 3.106445 = UCL=3.793867 Number violating runs = 0 


When the Poisson count pertains to the total number of nonconformities in batches 
of items, a u chart plots the average number of nonconformities per unit. These 
charts show that the process is in control. 


The u chart with unequal sample sizes could be obtained by the following com- 
mands: data(dyedcloth); attach(dyedcloth); qcc(z,sizes=size,type= “u”). The chart 
is given below. 


UCL 


Group summary statistics 


LCL 


Group 


Number of groups = 10 
Center = 1.423256 LCL is variable Number beyond limits = 0 
StdDev = 3.986022 UCL is variable Number violating runs = 0 
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We notice that, due to the unequal sample sizes, the UCL and LCL are not constant 
anymore. This chart shows that the process is in control. 


13.5 CUSUM and EWMA Charts 


1. We load the data by commands SSD=read. table ( “SqcSemicondDiam. txt”, header=T); 
attach(SSD); diameter=qcc.groups(x, lot). 


(a) We use command require(qcc); cusum(diameter, std.dev=“UWAVE-SD”) to 
construct a CUSUM chart for the semiconductor wafer diameter data, which 
is shown below. The chart shows that the process is out of control since there 
are five points outside the control limits. 


cusum Chart 
for diameter 


Above target 


Cumulative Sum 


Below target 


1 3 a if EGR Bi: aie ah ah) 


Group 
Number of groups = 20 Decision interval (std. err.) = 5 
Center = 174.3422 Shift detection (std. err.) = 1 
StdDev = 3.45726 No. of points beyond boundaries 


(b) We use command ewma(diameter) to construct a EWMA chart for the semi- 
conductor wafer diameter data, which is shown below. The chart shows that 
the process is out of control since there is one point outside the control limits. 
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EWMA Chart 
for diameter 
w 
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Group 
Number of groups = 20 Smoothing parameter = 0.2 
Center = 174.3422 Control limits at 3*sigma 
StdDev = 3.458422 No. of points beyond limits = 1 


2. We load the data by commands SRP=read.table(“SqcRedoxPotent.tat”, header=T); 
attach(SRP); require(qcc). To construct the CUSUM and EWMA charts for the 
chlorine data, we use cusum(x) and ewma(zx), respectively, and the charts are given 
below. 


cusum Chart 
for x 


UDB 


Above target 
4 


2 


Cumulative Sum 
0 


Below target 


LDB 


Group 
Number of groups = 10 Decision interval (std. err.) = 5 
Center = 15.3 Shift detection (std. err.) = 1 
StdDev = 4.826635 No. of points beyond boundaries 
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EWMA Chart 
for x 


20 


UCL 


Group Summary Statistics 
ils 


LCL 


10 


Group 
Number of groups = 10 Smoothing parameter = 0.2 
Center = 15.3 Control limits at 3*sigma 
StdDev = 4.826635 No. of points beyond limits = 0 


The charts suggest the process mean is in control. 
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